[Israel.pm] What is the best way to compare huge arrays?

Offer Kaye offer.kaye at gmail.com
Sun Apr 17 05:17:17 PDT 2005


On 4/17/05, manora <manora at netvision.net.il> wrote:
> My question is which of the 2 styles is more perlish, memory efficient,
> faster.
> When comparing two huge arrays, to find the union and unique of each
> array.
> Here is my sample code, better solutions are very welcome.
> Thenx, arik manor.
> 

Hi Arik,
First of all, running your code, I see that in "union" you actually
compute the intersection of the arrays. See:
http://en.wikipedia.org/wiki/Union_%28set_theory%29
vs.
http://en.wikipedia.org/wiki/Intersection_%28set_theory%29

Secondly, if the arrays are large enough (what did you mean by
"huge"?), no in-memory solution will be good, since it might mean
causing Perl to run out of memory. So you will have to go to a fully
or partially disk-based solution, i.e. tied variables, which will by
definition be slower than a in-memory solution.

For more self-study, I suggest you look at the code of the Perl
Cookbook about arrays, available for free here:
http://pleac.sourceforge.net/pleac_perl/arrays.html
Scroll down to the section titled "Computing Union, Intersection, or
Difference of Unique Lists", it will show you some nice Perl idioms.

Finally, there is the ever popular CPAN :-)
The following lovely module not only has methods for every operation
you're trying to do, it thoughtfully includes links at the bottom to
other CPAN modules that provide the same or similar functionality:
http://search.cpan.org/dist/List-Compare/Compare.pm

-- 
Offer Kaye




More information about the Perl mailing list