[Israel.pm] Handling huge data-structures?

Gaal Yahas gaal at forum2.org
Sun Aug 29 09:55:31 PDT 2004


[Edited.]

On Sun, Aug 29, 2004 at 07:21:20PM +0300, Yuval Yaari wrote:
> > Yuval, if I were you I'd do some research on DB_File and maybe SQLite
> > via Class::DBI. Or just chuck the semantic equivalence requirement and
> > bite the SQL bullet: sufficiently different problems tolerate different
> > solutions.
>
> SQLite through CDBI sounds like fun (yet slow).
> BUT doesn't SQLite's DB stay in the memory all the time?
> DB_File might work. I read about it a year ago, I'll freshen up my memory
> and let you know :)
> 
> But still, I thought the bioinformatics guys would work their magic to
> allow me to use a "normal" hash that is not in the memory (or at least not
> entirely).

SQLite uses a single file to store a single database. The database
*code* is embedded in your application (DBD::SQLite in this case), not
the actual data.

I suggested CDBI for a familiar (and scalable) interface, not for speed.
I expect (but don't take my word for it) that straight SQLite is going
to be one of your fastest options that don't require you to implement
the store yourself.

The advantage with DB_File is that it uses perl's tie interface. I have
no idea if it works for large files though and indeed how it performs.
Whatever the bioinformatics guys were talking about, it probably used
tie for the magic.

Note that each solution here will still have different performance
characteristics than a natural Perl hash, even if it looks like one.
Some operations may be more expensive than you think. Also, tuning a
large database in the low levels can dramatically impact performance,
if you know how. But you will need to research this.


Some threads:
http://perlmonks.org/index.pl?node_id=147051
http://perlmonks.org/index.pl?node_id=146377

-- 
Gaal Yahas <gaal at forum2.org>
http://gaal.livejournal.com/



More information about the Perl mailing list