[Israel.pm] patterns findings

Evgeny evgeny.zislis at gmail.com
Wed Mar 18 05:04:57 PDT 2009


Do you need to match partial lines as well?
Generic string matching algorithms are pretty complex, but if you have
restrictions on the type of tokens then you can simplify the problem.

For example you could approach this problem by tokenizing patterns.
Your example:
> I want to go home
> 1+3=4
> Hello World
> 1+3=4
> Hello World
> 8+2=10
> Hello World
> Klskd

Can be tokenized as ANBNCNBNCNDNCNE. Then it is much easier to find
repeating patterns in this tokenized string (it is shorten and only
one line long). And if you want {n}+{n}={n} to always be tokenized
with the same token - then it might even apply to your second
question.

A => I want to go home
B => 1+3=4
C => Hello World
D => 8+2=10
E => Klskd
N => new line character

The assumption for this kind of approach is that your tokens are
line-long and all it does is simplify the amount of data that needs to
be matched. If you dont assume that the tokens are line-long then
matching characters

On Wed, Mar 18, 2009 at 1:10 PM, Yossi Itzkovich
<Yossi.Itzkovich at ecitele.com> wrote:
> Hi,
> I am looking for a module that will get a file with text lines, and will find repeating patterns.
> Example:  given the following file:
> -----
> I want to go home
> 1+3=4
> Hello World
> 1+3=4
> Hello World
> 8+2=10
> Hello World
> Klskd
> -----
> The script should tell me that the sequence : 1+3=4  and Hello World repeat 2 times.  A better script may tell me even that the more general pattern: {number}+number}={number} and Hello World  repeat 3 times.
>
> Any suggestion?
>
> Thanks
> Yossi
> _______________________________________________
> Perl mailing list
> Perl at perl.org.il
> http://perl.org.il/mailman/listinfo/perl
>


More information about the Perl mailing list