[Israel.pm] regexp

Shlomi Fish shlomif at iglu.org.il
Sun Jun 25 01:46:51 PDT 2006

On Sunday 25 June 2006 11:14, Ernst, Yehuda wrote:
> Hello!
> I have a text like this
> "aaa<asd>='asd'/6>bbb<asd>='asd'/3>ccc<asd>='asd'/5>ddd###"
> I need to extract the aaa bbb ccc ddd
> between is the same <asd>='asd'/6>
> just the number can be different
> i do not know how many <asd>='asd'/6> are there the end is like this ###
> any ideas?

Not really a ready solution, but from the problem I suggest you tokenise the 
string into tokens, and then manipulate the array of tokens etc. You can do 
it using:

1. if ($string =~ s{^$regex}{})
   elsif ($string =~ s{^$regex2}{})

2. Alternatively use a lexer module from CPAN:


There are also some others which you can find from a CPAN search. Also see:


They are basically an interface above this if ... elsif statement.

3. There are other ways to write #1 above. One of them is using /g and \G.


Using simple regexps to parse HTML (which seems similar to your problem) is a 
very old Perl request, and often appears in #perl on Freenode.


	Shlomi Fish

Shlomi Fish      shlomif at iglu.org.il
Homepage:        http://www.shlomifish.org/

95% of the programmers consider 95% of the code they did not write, in the
bottom 5%.

