[Israel.pm] A progblem with files diff.

Peter Gordon peter at pg-consultants.com
Wed Jan 25 01:10:17 PST 2006


1. Use a hash algorithm such as md5sum to convert each line of data to a
checksum, and do it for both files. So the new files, rather than having
the original data, have a list of checksums.

2. Do the diff for the list of checksums. If the files are still too
large you could use just the first 10 characters of the checksum. It is
probably good enough, but you would have to check for uniqueness.

3. Convert the text for the line numbers back to the original text.

Peter

On Wed, 2006-01-25 at 10:40 +0200, Meron.Cohen at ecitele.com wrote:
> Hello,
> 
> Problerm description:
> 
> Given 2 text files (that might be large, 60MB approximately, sometimes)  I 
> need to find all differences between those 2 files.
> Tried to use "diff", but for large files it doesn't work. So, I tried to 
> use "bdiff" (for Big Diff), and it worked, only it doesn't give the 
> minimum differences, 
> because it does the diff segment by segement (so, I can't use it for my 
> purposes).
> 
> Questions:
> 
> I would like to know if there is a perl module that can overcome the upon 
> problems. If there is one, could you give me a brief example?
> 
> 
> P.S.
> 
> If you know about an option to give a diff algorithm or program a Regular 
> Expression as a parameter, in order to skip differences that match this 
> RE, it would be wonderful.
> If you don't know a good module for this but know a program that would do 
> it for me, that would be a temporary solution for me.
> 
> Thanks a lot,
> 
> Meron Cohen
> _______________________________________________
> Perl mailing list
> Perl at perl.org.il
> http://perl.org.il/mailman/listinfo/perl
> 
> 





More information about the Perl mailing list