[Israel.pm] Re: optimising memmory usage

Shlomo Yona shlomo at cs.haifa.ac.il
Mon Jan 5 04:54:12 PST 2004


On Mon, 5 Jan 2004, Yosef Meller wrote:

> Is it important for all the input files to be concatenated? If not, you
> can just fetch the file names and then process each at a time.
> If it is, you can still process each at a time and then sum the results
> from all files into one (however then sequences at ends of files will
> not be concatenated to starts of files).

That is actually good advice.
I did that (see the other emails I sent on this thread last
night), and of course, it made greate improvement.


> Inspired by the code of Acme::Bleach, I found a way to reduce the size
> of your hash keys by packing them with single bits instead of space
> charachters. 

Ahhh! Wait!
pack and unpack() don't work well (please correct me if I'm
wrong) with UTF8 encoding.

Still -- If I know that the input I deal with can be also
represented in a chatacter set which fits into a byte's
representation and is simply encoded like ASCII (or 8bt
ascii) then your idea can work.

Another problem -- I'm using keys representing string which
are in many many cases longer than 8 characters. So...
thanks, but I need something more robust. Anyway, your idea
is nice and useful, had I been working on English text, for
example. As I'm handling Hebrew, and as I'm doing statistics
on N-Grams which make up keys larger than 8 characters/bytes
I need some other strategy.

Thanks again.

-- 
Shlomo Yona
shlomo at cs.haifa.ac.il
http://cs.haifa.ac.il/~shlomo/




More information about the Perl mailing list