[Israel.pm] segmentation fault in regex

Yossi.Itzkovich at ecitele.com Yossi.Itzkovich at ecitele.com
Sun Mar 12 05:35:05 PST 2006


Before you all try to invernt a broken wheele, ehere is the original text
from perlfaq6:



How do I use a regular expression to strip C style comments
     from a file?

     While this actually can be done, it's much harder than you'd
     think.  For example, this one-liner

         perl -0777 -pe 's{/\*.*?\*/}{}gs' foo.c

     will work in many but not all cases.  You see, it's too
     simple-minded for certain kinds of C programs, in
     particular, those with what appear to be comments in quoted
     strings.  For that, you'd need something like this, created
     by Jeffrey Friedl and later modified by Fred Curtis.

         $/ = undef;
         $_ = <>;

s#/\*[^*]*\*+([^/*][^*]*\*+)*/|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#$2#gs

         print;

     This could, of course, be more legibly written with the "/x"
     modifier, adding whitespace and comments.  Here it is
     expanded, courtesy of Fred Curtis.

         s{
            /\*         ##  Start of /* ... */ comment
            [^*]*\*+    ##  Non-* followed by 1-or-more *'s
            (
              [^/*][^*]*\*+
            )*          ##  0-or-more things which don't start with /
                        ##    but do end with '*'
            /           ##  End of /* ... */ comment

          |         ##     OR  various things which aren't comments:

            (
              "           ##  Start of " ... " string
              (
                \\.           ##  Escaped char
              |               ##    OR
                [^"\\]        ##  Non "\
              )*
              "           ##  End of " ... " string

            |         ##     OR


              '           ##  Start of ' ... ' string
              (
                \\.           ##  Escaped char
              |               ##    OR
                [^'\\]        ##  Non '\
              )*
              '           ##  End of ' ... ' string

            |         ##     OR

              .           ##  Anything other char
              [^/"'\\]*   ##  Chars which doesn't start a comment, string
or escape
            )
          }{$2}gxs;

     A slight modification also removes C++ comments:


s#/\*[^*]*\*+([^/*][^*]*\*+)*/|//[^\n]*|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"'\\]*)#$2#gs;




My code was taken from the last line

Yossi







More information about the Perl mailing list