Posts tagged: SQL

More than one way to skin the Database Cat

I’m in the planning stages of a Perl application where I need some sort of database engine back-end with pretty basic requirements. I’ve surprised myself in picking what, today, has become an unconventional choice; DBM. For long time Unix practitioners, DBM is well known and there have been many re-implementations of the original idea (NDBM, SDBM, GDBM, etc.). Whatever the iteration, the basic idea is the same; a library of routines that are loaded into the application’s own address space and provides a basic key/value based mechanism for storing and retrieving records. See Wikipedia DBM article for a brief explanation and history of DBM.

Of course many application developers have forgotten all about DBM, assuming that it is an obsolete technology. The assumption today has pretty much become that the database back-end engine will of course be a relational DB with an SQL query language interface. But between applications that need all the power and features of a full-blown relational database, and those that only need low-level read/write operations of general purpose file system, there is a middle ground that a DBM-like database engine fills very nicely.

As it turns out DBM style databases are not dead at all but are actively being developed. QDBM (Quick DataBase Manager) is just one of several modern DBM-like open source database libraries available for deployment. The QDBM web page graciously lists a short description of some of its “brothers” where author Mikio Hirabayashi writes:


There are many followers of UNIX DBM. Select the best suited one for your products. NDBM is ancient and you should not use it. SDBM is maintained by Apache Project, and GDBM is maintained by GNU Project. They are most popular and time-tested. TDB is maintained by Samba Team. It allows multiple simultaneous writers. While CDB does not support updating at a runtime, it is the fastest. Berkeley DB is very multifunctional and ACID compliant. It is used in many commercial products. Finally, QDBM is balanced of performance, functionality, portability, and usability.

For my own project I’ve decided to give Berkeley DB a try. It appears to be feature rich, well supported, and perhaps most importantly, is already installed on the system where I will be deploying my application. :-)

For more information about the DBM approach to database management for applications, and about Berkeley DB in particular, I recommend checking out the first chapter of the Berkeley DB Programmer’s Reference Guide. It’s an interesting read and spells out the case for when a DBM style database is, and is not a good fit an application.

Rank function in PL/SQL

Here’s some code which demonstrates the usefulness of the Rank function in PL/SQL (can also be used in T-SQL):

Sample Data:

user_id email_address email_type activity_date
6437 jim@yahoo.com A 6/21/2009
5271 fred@gmail.com A 6/17/2009
5271 fred12@gmail.com A 6/29/2009
6988 steve@yahoo.com B 6/09/2009
5271 freddy@hotmail.com B 6/26/2009
6988 stephen@comcast.com A 6/22/2009
6437 jimbo@gmail.com B 6/23/2009
5271 fredericko@cantina.com C 6/24/2009

SELECT user_id,
     email_address,
     email_type,
 RANK () OVER (PARTITION BY user_id ORDER BY email_type, activity_date DESC) rnk
FROM email_table

Output:

user_id email_address email_type activity_date rnk
6437 jim@yahoo.com A 6/21/2009 1
6437 jimbo@gmail.com B 6/23/2009 2
5271 fred12@gmail.com A 6/29/2009 1
5271 fred@gmail.com A 6/17/2009 2
5271 freddy@hotmail.com B 6/26/2009 3
5271 fredericko@cantina.com C 6/24/2009 4
6988 stephen@comcast.com A 6/22/2009 1
6988 steve@yahoo.com B 6/09/2009 2


(Notice on the two records where user_id = 5271 and email_type = ‘A’ the rank of 1 is given to the record with the most current activity_date. This is because the ORDER BY portion of the RANK function sorts the records for each unique user_id first by email_type and second by activity_date. This makes it possible to select the most current record)

And with this code we can grab only the email addresses with a type of ‘A’ and the most current activity_date by using the WHERE rnk = 1 clause:

SELECT email_address FROM
 (SELECT user_id,
       email_address,
       email_type,
  RANK () OVER (PARTITION BY user_id ORDER BY email_type, activity_date DESC) rnk
 FROM email_table)
WHERE rnk = 1;

Output:

email_address
jim@yahoo.com
fred12@gmail.com
stephen@comcast.com

Panorama theme by Themocracy