Yes, the greediness of [^"'/]+ was definitely the problem... The new regular expression to strip of C-Style comments from a JavaScript file is:
$strOutput =~ s{ # First, we'll list things we want
# to match, but not throw away
(
# Match a regular expression (they start with ( or =).
# Then the have a slash, and end with a slash.
# The first slash must not be followed by * and cannot contain
# newline chars. eg: var "re = /\*/;" or "a = b.match (/x/);"
(?:
[\(=] \s*
/
(?:
# char class contents
\[ \^? ]? (?: [^]\\]+ | \\. )* ]
|
# escaped and regular chars (\/ and \.)
(?: [^[\\\/]+ | \\. )*
)*
/[gi]*
)
| # or double quoted string
(?:
"[^"\\]* (?:\\.[^"\\]*)*" [^"'/]*
)+
| # or single quoted constant
(?:
'[^'\\]* (?:\\.[^'\\]*)*' [^"'/]*
)+
)
|
# or we'll match a comment. Since it's not in the
# $1 parentheses above, the comments will disappear
# when we use $1 as the replacement text.
/ # (all comments start with a slash)
(?:
# traditional C comments
(?:
\* [^*]* \*+
(?: [^/*] [^*]* \*+ )*
/
)
| # or C++ //-style comments
(?:
/ [^\n]*
)
)
}{$1}gsx;
I'll do some further testing, but it looks like this huge regex will do the trick! Thanks and ++ to you chipmunk.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|