[Israel.pm] Memory problem - Loading big files
gordon at cshl.edu
Thu Jun 19 10:08:50 PDT 2008
Thank you for your responses.
In the meantime (while my message was waiting moderator approval :-) )
I've rewritten it in C, and now it's both lightning fast and memory
However, I would like to continue this tiny discussion, and maybe you'll
help me find better solution for future similarities.
> The problem is not in the file. You have @probes with number of items
> as the amount of lines in the file.
> For each line you have an additional array...
That's intentional. I need the numeric information in each field for
searching and comparing.
I realize that there is a new array variable for each line. I just hoped
it wouldn't consume as much memory as it did.
> I would say that the problem is with the algorithm, which requires
> Assaf to load the entire file into memory. What happens if Assaf's
> business grows and he has to deal with a 2.5GB sized file?
My algorithm uses a binary-search on a field in this file. That's why I
wanted to loaded the entire file into a list (the fields/rows in the
file are already sorted).
It's possible there are much better algorithms for my needs, but I
really wished for something quick-and-dirty:
1. load entire list (it's already sorted)
2. Do a couple of binary-searches on the list (a couple of searches for
each user request, with possibly tens of requests).
Regarding the size issue:
My computer has 3GB of ram. My C program (which takes about 400MB for
this 250MB file) will still be able to cope with bigger files.
But with Perl, a 250MB data takes more than 2.5GB of memory - this
indeed won't do much good.
> Perl variables do consume a lot of memory.
> I ran a script similar to yours (just without creating the external file).
> for me it only used 800 Mb memory on a perl 5.8.8 on Ubuntu.
> So I wonder if your actual file contains more lines
I use perl "v5.8.8 built for i486-linux-gnu-thread-multi", from a
standard Ubuntu 8.04 package.
Running "wc" on my file returns 2161680 lines, 17293440 words
(=~fields) 281866344 characters.
So my estimation of ~250MB file with ~2.1MB lines is more or less correct.
> Perl has a lot of overhead on its data-structures. So it may run out
> of memory with such large data. I suggest you use some kind of
> database instead
I have a working Postgres with all the DBI/DBD modules installed, but
again - I wished for something quick and dirty.
This is just a favor I'm doing for somebody in the lab - not a full
Additionally, Had this worked (using perl alone, no databases or
anything else), Other people here (Mac Users, ugh!) could also run it on
their Macs and leave my alone :-)
More information about the Perl