http://qs321.pair.com?node_id=149594


in reply to Extracting C-Style Comments (Revisited Again)

It took longer than I'd like to admit to figure out the problem this time. :)
| # or double quoted string (?: "[^"\\]* (?:\\.[^"\\]*)*" [^"'/]* )
This matches a double-quoted string, then some amount of code after the double-quoted string. [^"'/]* will match everything up to and including the open parenthesis or equal sign that you are relying on to match as the beginning of the JS regular expression. Simply remove that bit from your regex (after the single-quoted string match as well) and the JS code snippet will be parsed properly.

Replies are listed 'Best First'.
Re: Re: Extracting C-Style Comments (Revisited Again)
by Incognito (Pilgrim) on Mar 06, 2002 at 03:56 UTC

    Excellent! Another ++ to you!!! I actually understood your answer for once, which is great... short and to the point... Here's the fully updated regex code for those that are interested...

    #--------------------------------------------------------------------- +- # Here is the fundamental code to match JavaScript code. # This includes regular expressions and quoted strings. #--------------------------------------------------------------------- +- my ($regexJSCode) = qr{ # First, we'll list things we want # to match, but not throw away (?: # Match a regular expression (they start with ( or =). # Then the have a slash, and end with a slash. # The first slash must not be followed by * and cannot contain # newline chars. eg: var "re = /\*/;" or "a = b.match (/x/);" [\(=] \s* / (?: # char class contents \[ \^? ]? (?: [^]\\]+ | \\. )* ] | # escaped and regular chars (\/ and \.) (?: [^[\\\/]+ | \\. )* )* / (?: [gi]* # next characters are not word characters (?= [^\w] ) ) ) | # or double quoted string (?: "[^"\\]* (?:\\.[^"\\]*)*" )+ | # or single quoted constant (?: '[^'\\]* (?:\\.[^'\\]*)*' )+ }x; #--------------------------------------------------------------------- +- # Here is the fundamental code to match JavaScript comments and commen +t blocks. #--------------------------------------------------------------------- +- my ($regexJSComments) = qr{ # or we'll match a comment. Since it's not in the # $1 parentheses above, the comments will disappear # when we use $1 as the replacement text. / # (all comments start with a slash) (?: # traditional C comments (?: \* [^*]* \*+ (?: [^/*] [^*]* \*+ )* / ) | # or C++ //-style comments (?: / [^\n]* ) ) }x; #--------------------------------------------------------------------- +- # Get rid of all comments from the string. #--------------------------------------------------------------------- +- $strOutput =~ s{ ( $regexJSCode ) | $regexJSComments }{$1}gsx;