[Israel.pm] regexp

Shlomi Fish shlomif at iglu.org.il
Sun Jun 25 01:46:51 PDT 2006


On Sunday 25 June 2006 11:14, Ernst, Yehuda wrote:
> Hello!
>
>
> I have a text like this
>
> "aaa<asd>='asd'/6>bbb<asd>='asd'/3>ccc<asd>='asd'/5>ddd###"
>
> I need to extract the aaa bbb ccc ddd
> between is the same <asd>='asd'/6>
> just the number can be different
>
> i do not know how many <asd>='asd'/6> are there the end is like this ###
>
> any ideas?
>

Not really a ready solution, but from the problem I suggest you tokenise the 
string into tokens, and then manipulate the array of tokens etc. You can do 
it using:

1. if ($string =~ s{^$regex}{})
   elsif ($string =~ s{^$regex2}{})
   .
   .
   .

2. Alternatively use a lexer module from CPAN:

http://search.cpan.org/dist/HOP-Lexer/
http://cpan.uwinnipeg.ca/dist/Parse-Flex

There are also some others which you can find from a CPAN search. Also see:

http://www.shlomifish.org/Vipe/lecture/Sys-Call-Track/Lex-Yacc/

They are basically an interface above this if ... elsif statement.

3. There are other ways to write #1 above. One of them is using /g and \G.

-------------

Using simple regexps to parse HTML (which seems similar to your problem) is a 
very old Perl request, and often appears in #perl on Freenode.

Regards,

	Shlomi Fish

---------------------------------------------------------------------
Shlomi Fish      shlomif at iglu.org.il
Homepage:        http://www.shlomifish.org/

95% of the programmers consider 95% of the code they did not write, in the
bottom 5%.



More information about the Perl mailing list