[Israel.pm] A simpler regex required

Yuval Yaari yuval at windax.com
Wed Aug 15 15:13:46 PDT 2007

Peter Gordon wrote:
> Wouldn't it be cute and much more intuitive if we could write this? 
> s!<td.*?>(.*?)</td>!$1 =~ s/\s/&nbsp;/g!eg

Yes. It wouldn't.

These variables are read-only for very good reasons, y'know :-)
Also notice the regex you just wrote doesn't do what you originally 
asked for.
Assuming you wouldn't get errors for modifying a read-only variable, 
you're completely deleting <td> and </td>.

I hope you don't mean you want the current behaviour that we all know 
and love to "match-but-do-not-replace-anything-that's-not-grouped" :-)

So basically we:
 - *Have* to tell the engine what we want to "match-but-not-replace", as 
opposed to the current "match-and-replace"
 - *Really* want $1 to be read-only (think of you debugging experience 
when some function somewhere modifies $1 :))

The cleanest solution would probably be (works starting from perl 5.9.5):

$str =~ s{<td.*?>   # variable-length look-behind :)
          \K        # tells Perl to "keep" that <td>
          (.+?)     # text that we want to s///
          (?=</td>) # look-ahead; won't be replaced

sub htmlify {local $_=$1; s/\s/&nbsp;/g; $_}

I think the substitution should have occurred in a subroutine anyway, 
even if $1 was writable.
Maybe it would be cool if look-ahead had a backslash-thingie notation. 

And you might like this one; I personally hate it:

my $text;
$str =~ s{<td.*?> \K
          (?{ $text = $^N; $text =~ s/\s/&nbsp;/g; })

If you have any better ideas of how it could/should look (with the 
constraints I've mentioned, or a way to break them without ruining every 
Perl programmer's life by breaking the current behaviour) please share 
them :-)



More information about the Perl mailing list