[Israel.pm] a hash question

Mark Dominus mjd-list-israelpm at plover.com
Wed Feb 18 04:25:30 PST 2004

> I remember MJD recommending one of the DBM files  - do people remember
> which ?  and why ?

DB_File is the only one that doesn't seem to have big problems.

ODBM, NDBM, and SDBM all have limits on the sizes of the data that you
can store.  For example, with SDBM, each key and the data together may
not be more than 1024 bytes.

Also, these three scale very badly when you try to put a lot of data
into them:

          # Keys   File extent  Space used
                     (ls -l)     (ls -s)

               1         1024        8  
               2         2048        8
               4         4096        8
               8         8192       48
              16       120832      120
              32       245760      208
              64       441344      296
             128      4251648      456
             256     12701696     1456
             512     21091328     2320
            1024     33284096     4128
            2048    536668160    11592
            4096   1065409536    22272

Most unix systems have a 2GB limit on the extent of a file, so even
though we're not storing very much data, the file is soon too big for
the OS to handle.

GDBM doesn't have these limitations.  But I don't use it any more
because in 1998 I was using it for a web user database for a major
client; we had about 320,000 registered users, and one day, the
'firstkey' and 'nextkey' routines stopped producing all the keys.
They would generate about 1,700 of the usernames and then stop, so I
couldn't get the list of our users.

I sent a detailed bug report to the GNU folks, offering to do whatever
I could to help, and the reply said:

        I have heard of this happening before.  I was not able to find
        out why.  Do you have a backup of earlier versions so you can
        get most of your keys out?  If so, you might try to recover by
        moving to DB-2.? routines.  They are still being updated an
        developed.  gdbm has not had any active development in years.

So I restored what I could from the backup tapes, I switched to
Berkeley DB, and I have not used [[GDBM]] since then.

More information about the Perl mailing list