[Israel.pm] A simpler regex required

Ernst, Yehuda yernst at nds.com
Tue Aug 14 09:59:54 PDT 2007

what is 

-----Original Message-----
From: perl-bounces at perl.org.il [mailto:perl-bounces at perl.org.il]On Behalf Of Amir E. Aharoni
Sent: Tuesday, August 14, 2007 7:54 PM
To: Perl in Israel
Subject: Re: [Israel.pm] A simpler regex required

On 14/08/07, Peter Gordon <peter at pg-consultants.com> wrote:
> Hi.
> Let's suppose that I have the following lines in an HTML file.
> I want to substitute the spaces in the date part with non-breaking spaces (&nbsp;)
> <td  style="text-align: left" bgcolor="#92c1bb">Aug 12 23:59:59 2007 GMT</td>
> <td  style="text-align: left" bgcolor="#92c1bb">Aug 12 23:59:59 2007 GMT</td>
> I came up with this line - but somehow it isn't aesthetic.
> s!(<td.*?>)(.*?)(</td>)!my $t1 = $1 ;my $t2 = $2 ; my $t3 = $3 ; $t2 =~ s/\s/&nbsp;/g ; "$t1$t2$t3" ;!egs ;
> Is there a nicer/cleaner way to write it?

It's a clever way, but i am very much into "Perl Best Practices"
lately, which says "Don't be clever" :)

It's very TMTOWTDI, of course.

I thought of a different regex for this, without the /e . I thought of
using lookbehind assertions, something like (?<= .*>), but apparently
variable length lookbehind assertions are not implemented.

I could also recommend HTML::Parser, but if all you need is replacing
some spaces, then it would be overkill.

So your algorithm is OK, but you don't need the outer s/// at all, and
if you do use it, then you don't need the outer /g , because the first
part of the outer s/// is used only for capturing the HTML.

I would write the same algorithm more readably and simply like this:

if ($str =~ m{
    my ($t1, $t2, $t3) = ($1, $2, $3);
    $t2 =~ s/\s/&nbsp;/g;
    $str = "$t1$t2$t3";
else {
    print "expected HTML not found\n";
Perl mailing list
Perl at perl.org.il
This e-mail is confidential, the property of NDS Ltd and intended for the addressee only.  Any dissemination, copying or distribution of this message or any attachments by anyone other than the intended recipient is strictly prohibited.  If you have received this message in error, please immediately notify the postmaster at nds.com and destroy the original message.  Messages sent to and from NDS may be monitored.  NDS cannot guarantee any message delivery method is secure or error-free.  Information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses.  We do not accept responsibility for any errors or omissions in this message and/or attachment that arise as a result of transmission.  You should carry out your own virus checks before opening any attachment.  Any views or opinions presented are solely those of the author and do not necessarily represent those of NDS.

NDS Limited Registered office: One Heathrow Boulevard, 286 Bath Road, West Drayton, Middlesex, UB7 0DQ, United Kingdom. A company registered in England and Wales  Registered no. 3080780   VAT no. GB 603 8808 40-00

To protect the environment please do not print this e-mail unless necessary.

More information about the Perl mailing list