[Israel.pm] Memory problem - Loading big files

Assaf Gordon gordon at cshl.edu
Thu Jun 19 10:08:50 PDT 2008

Hello All,

Thank you for your responses.

In the meantime (while my message was waiting moderator approval :-) )
I've rewritten it in C, and now it's both lightning fast and memory

However, I would like to continue this tiny discussion, and maybe you'll
help me find better solution for future similarities.

RE: Yossi:
> The problem is not in the file.  You have @probes with number of items
> as the amount of lines in the file.
> For each line you have an additional array...

That's intentional. I need the numeric information in each field for
searching and comparing.
I realize that there is a new array variable for each line. I just hoped
it wouldn't consume as much memory as it did.

RE: Omer:
> I would say that the problem is with the algorithm, which requires
> Assaf to load the entire file into memory.  What happens if Assaf's
> business grows and he has to deal with a 2.5GB sized file?

My algorithm uses a binary-search on a field in this file. That's why I
wanted to loaded the entire file into a list (the fields/rows in the
file are already sorted).

It's possible there are much better algorithms for my needs, but I
really wished for something quick-and-dirty:
1. load entire list (it's already sorted)
2. Do a couple of binary-searches on the list (a couple of searches for
each user request, with possibly tens of requests).
3. Done.

Regarding the size issue:
My computer has 3GB of ram. My C program (which takes about 400MB for
this 250MB file) will still be able to cope with bigger files.
But with Perl, a 250MB data takes more than 2.5GB of memory - this
indeed won't do much good.

RE: Gabor
> Perl variables do consume a lot of memory.
> I ran a script similar to yours (just without creating the external file).
> for me it only used 800 Mb memory on a perl 5.8.8 on Ubuntu.
> ...
> So I wonder if your actual file contains more lines

I use perl "v5.8.8 built for i486-linux-gnu-thread-multi", from a
standard Ubuntu 8.04 package.

Running "wc" on my file returns 2161680 lines,  17293440 words
(=~fields) 281866344 characters.
So my estimation of ~250MB file with ~2.1MB lines is more or less correct.

RE: Shlomi
> Perl has a lot of overhead on its data-structures. So it may run out
> of memory with such large data. I suggest you use some kind of
> database instead

I have a working Postgres with all the DBI/DBD modules installed, but 
again - I wished for something quick and dirty.
This is just a favor I'm doing for somebody in the lab - not a full
scale project.

Additionally, Had this worked (using perl alone, no databases or
anything else), Other people here (Mac Users, ugh!) could also run it on
their Macs and leave my alone :-)


More information about the Perl mailing list