[Israel.pm] Regex: Groups inside a ?:-cluster

Eli Billauer eli at billauer.co.il
Thu Jan 1 03:57:18 PST 2009


Gaal Yahas wrote:

> I couldn't find other mention of this, so I'd say this behavior is a
> bit underspecced, but unlikely to change in Perl 5 -- too many things
> would break otherwise.
>
>   
Thanks. I suppose that's the best answer one can get...

In the meanwhile, I found out that it may not always be such a good idea 
to be a mathematician about regular expressions. Consider, for example, 
this:

$chars = qr/[\-_+%a-z0-9]/; # Some chars we allow
$charsdot = qr/\.|$chars/; # Dot allowed as well

Cute, isn't it? $charsdot is everything $chars is, only with the dot 
allowed as well. Now we can use it in regular expressions, such as

print "Matched\n" if ($x =~ /$charsdot{20000}/);

Well, not such a good idea. Trying this on Perl 5.8.8 makes the matching 
above run 10 times slower (5 whole seconds for a 10MBytes random string) 
compared with simply adding the dot to the square brackets.

Which shouldn't come as a surprise, if we run "print $charsdot;" just to 
find out that it gives:
(?-xism:\.|(?-xism:[\-_+%a-z0-9]))

Lesson learned: This regular expression is not optimized. Not the 
slightest bit. This isn't a qr// issue, since the same thing happens 
when $chars' content is written in explicitly.

-----------------------------

As for inline comments with /x, I don't think that makes the code more 
readable, but that's a matter of taste. I kind-of lose the continuity, 
and it's pretty difficult to get really useful comments in there. On the 
other hand, if a parentheses get wrongly placed, then the comments 
convince the readers what he or she should read, which makes the code 
even more difficult to maintain.

What I liked about the qr// is that the regular expression can be broken 
down to its pieces with meaningful names. But as the example above 
shows, that could have a cost.

    Eli

-- 
Web: http://www.billauer.co.il




More information about the Perl mailing list