[Israel.pm] A simpler regex required

Shlomi Fish shlomif at iglu.org.il
Tue Aug 14 14:08:20 PDT 2007


On Tuesday 14 August 2007, Amir E. Aharoni wrote:
> On 14/08/07, Peter Gordon <peter at pg-consultants.com> wrote:
> > Hi.
> >
> > Let's suppose that I have the following lines in an HTML file.
> > I want to substitute the spaces in the date part with non-breaking spaces
> > (&nbsp;)
> >
> > <td  style="text-align: left" bgcolor="#92c1bb">Aug 12 23:59:59 2007
> > GMT</td> <td  style="text-align: left" bgcolor="#92c1bb">Aug 12 23:59:59
> > 2007 GMT</td>
> >
> > I came up with this line - but somehow it isn't aesthetic.
> >
> > s!(<td.*?>)(.*?)(</td>)!my $t1 = $1 ;my $t2 = $2 ; my $t3 = $3 ; $t2 =~
> > s/\s/&nbsp;/g ; "$t1$t2$t3" ;!egs ;
> >
> > Is there a nicer/cleaner way to write it?
>
> It's a clever way, but i am very much into "Perl Best Practices"
> lately, which says "Don't be clever" :)
>
> It's very TMTOWTDI, of course.
>
> I thought of a different regex for this, without the /e . I thought of
> using lookbehind assertions, something like (?<= .*>), but apparently
> variable length lookbehind assertions are not implemented.
>
> I could also recommend HTML::Parser, but if all you need is replacing
> some spaces, then it would be overkill.
>
> So your algorithm is OK, but you don't need the outer s/// at all, and
> if you do use it, then you don't need the outer /g , because the first
> part of the outer s/// is used only for capturing the HTML.
>
> I would write the same algorithm more readably and simply like this:
>
> if ($str =~ m{
>     (<td.*?>)
>     (.*?)
>     (</td>)
> }xms)
> {
>     my ($t1, $t2, $t3) = ($1, $2, $3);
>     $t2 =~ s/\s/&nbsp;/g;
>     $str = "$t1$t2$t3";
> }
> else {
>     print "expected HTML not found\n";
> }

The problem with this code is that it won't work if $str contains more than 
one such <td>...</td> instance, or anyhing except it. If this is the case, it 
will replace the string with the contents of the string.

What I would do instead is extract the contents of the right side of the s/// 
expression into a function.

Regards,

	Shlomi Fish

---------------------------------------------------------------------
Shlomi Fish      shlomif at iglu.org.il
Homepage:        http://www.shlomifish.org/

If it's not in my E-mail it doesn't happen. And if my E-mail is saying
one thing, and everything else says something else - E-mail will conquer.
    -- An Israeli Linuxer



More information about the Perl mailing list