[Israel.pm] Memory problem - Loading big files

Shlomi Fish shlomif at iglu.org.il
Thu Jun 19 05:01:30 PDT 2008


On Tuesday 17 June 2008, Assaf Gordon wrote:
> Hello all,
>
> I'm having problems loading big files into memory - maybe you could help
> me solve them.
>
> My data file is a big (~250MB) text file, with eight tab-separated
> fields. I want to load the entire file into a list.

Perl has a lot of overhead on its data-structures. So it may run out of memory 
with such large data. I suggest you use some kind of database instead:

* http://www.postgresql.org/ - a client/server SQL database (MySQL is not 
recommended due to http://www.shlomifish.org/open-source/anti/mysql/ ).

* http://www.sqlite.org/ - a file-based SQL database. Public Domain.

* http://www.oracle.com/technology/products/berkeley-db/index.html - Berkeley 
DB - a simple key/value-based database. (GPL-like licence).

* http://freshmeat.net/projects/tokyocabinet/ - an LGPLed database that seems 
similar to BDB. (I did not test it yet).

>
> I've narrowed down the code into this:
> -------------
> #!/usr/bin/perl
> use strict;
> use warnings;
> use Data::Dumper;
> use Devel::Size qw (size total_size);
>
> my @probes;
> while (<>) {
> 	my @fields = split(/\s+/);
> 	push @probes, \@fields;
> }
>
> print "size = ", size(\@probes),"\n";
> print "total size= ", total_size(\@probes),"\n";
> print "data size = ", total_size(\@probes)- size(\@probes),"\n";
> print Dumper(\@probes),"\n";
> ------------
> (Can't get any simpler than that, right?)
>
> But when I run the program, the perl process consumes 2.5GB of memory,
> prints "out of memory" and stops.

That is expected.

>
> I know that perl isn't the most efficient memory consumer, but surely
> there's a way to do it...

You can try using perltie games - http://perldoc.perl.org/perltie.html , but I 
would recommend against it. Just use a database, or possibly use a C 
extension with hand-crafted memory allocation.

That or get a 64-bit machine with lots of available memory. ;-)

Regards,

	Shlomi Fish

>
> If you care to test it yourselves, here's a simple script that creates a
> dummy text file, similar to my own data file:
> -----
> #!/usr/bin/perl
> foreach (1..2100000) { print join("\t", "LONG-TEXT-FIELD", 11111,
> 222222, 3333333, 44444444, 5555555, 6666666,
> "VERY-VERY-VERY-VERY-VERY-VERY-VERY-VERY-VERY-LONG-TEXT-FIELD" ),"\n" ; }
> -----
>
>
> Thanks in advance for your help!
>       Assaf.
>
> _______________________________________________
> Perl mailing list
> Perl at perl.org.il
> http://perl.org.il/mailman/listinfo/perl

-----------------------------------------------------------------
Shlomi Fish       http://www.shlomifish.org/
My Aphorisms - http://www.shlomifish.org/humour.html

The bad thing about hardware is that it sometimes works and sometimes doesn't.
The good thing about software is that it's consistent: it always does not
work, and it always does not work in exactly the same way.



More information about the Perl mailing list