[Israel.pm] Regex: Groups inside a ?:-cluster

Jason Elbaum jason.elbaum at gmail.com
Wed Dec 31 14:59:33 PST 2008


> The idea behind this script, is that I want it to scan the string (in
> $s) and give me the digit immediately after an "x" or the character
> after an "x". And there's also a third pattern, which happens to match
> "x0" as well (with the space after it).

As you discovered, this pattern will not do what you appear to expect
it to do because of the way Perl applies regexes.

In a global match (/.../g), Perl first looks for the first place in
the string which matches the regular expression . Then it advances the
current pointer to the string to the point after that match, and looks
for another match of the expression from that point on. So once x(\d)
has matched 'x0', 'x0' is no longer available to be matched against
.([012].)

Furthermore, alternatives - a|b|c - are matched in order from left to
right. If a match is found, the remaining alternatives are not tested
at that point in the string. So even though both x(\d) and .([012].)
match the string at 'x0 ', only the former will match and capture a
value.


> Note that the regular expression is a (?: ) cluster which doesn't, in
> itself, put any elements in the resulting list. But each of the three
> possibilities in the cluster has a ()-group, which do.

Note that it is not necessary to enclose a set of alternatives in
parentheses. The expression would have had the same meaning had it
been written as /x(\d)|y(.)|.([012].)/gi

Finally, since you've expressed interest in the clarity of regexes, I
highly recommend using the /x flag, allowing whitespace and comments
inside a regex. For example:

my @z = ($s =~ / x(\d)        # Find the digit after an x
                           | y(.)         # Or the character after a y
                           | .([012].)  # Or a sandwiched digit
                          /gix            # Globally, case-insensitive
                );


Regards,

Jason Elbaum



More information about the Perl mailing list