Re: removing C style comments from text files
by Abigail-II (Bishop) on Jan 06, 2004 at 09:01 UTC
|
use Regexp::Common;
$string =~ s/$RE{comment}{C}//g;
Note that this (just like the regexes presented in the
rest of this thread) isn't context aware, and happily removes
"comments" from strings.
Abigail | [reply] [d/l] |
|
| [reply] |
|
That's kind of a chicken-and-egg problem, isn't? How can you
succesfully remove strings, if you can't detect comments?
Consider:
/* One " two */ a = b + 4; /* three " four */
You have to do it all in one pass. Something like:
s { ( [^"'/]* # Not a string, character o
+r comment.
| "[^\\"]*(?:\\.[^\\"]*)*" # String.
| '[^\\']*(?:\\.[^\\']*)*' # Char.
| / (?![*]) # Slash, not a comment.
)
|
( /[*] [^*]* (?: [*] [^*/]* )* [*]/ ) # Comment.
}
{ $2 ? "" : $1 }gsex;
But that isn't fool proof either (consider # define).
Abigail | [reply] [d/l] [select] |
Re: removing C style comments from text files
by bsb (Priest) on Jan 06, 2004 at 08:19 UTC
|
| [reply] |
Re: removing C style comments from text files
by ysth (Canon) on Jan 06, 2004 at 08:17 UTC
|
If you want it to work across newlines, you need to do a couple things. First, make sure all the lines are in a single string. Second, if you are using . and want it to
match any character including a newline, use the m//s flag.
Without /s, it will match any character except a newline.
The other issue is keeping what is supposed to match the inside of the comment from matching the end, some more code, and the beginning and inside of another comment.
The simple m/\/\*.*\*\//s regex will match all of "/* comment 1 */ some = code; /* comment 2 */". You tell * to match as little as possible instead of as much as possible by adding a ?, so it becomes m/\/\*.*?\*\//s. | [reply] [d/l] [select] |
|
so within ysth restrictions, if one wants a one-liner
perl -0777 -pe 's{/\*.*?\*/}{}gs' source.c
my .02
| [reply] [d/l] |
|
I just tried it and it worked! I've never seen a regex contained in braces before...and what's the deal with the empty braces at the end?
thanks!
| [reply] |
|
|
I will work with the m//s flag stuff. My original regex had the *?, but in my sample text file it didn't seem to make a difference, i.e. it wasn't acting greedy either way.
thanks!
| [reply] |
Re: removing C style comments from text files
by CountZero (Bishop) on Jan 06, 2004 at 09:07 UTC
|
Why would one want to remove comments from a source file? The compiler doesn't mind.
CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law
| [reply] |
|
It's an assignment for an intro Perl course.
UPDATE - hey hey hey now. I've always tried to be up front about my needs, and I always write as much code as I can before I ask for help, and I always do my best to learn what I can from all suggested code before I just go and use it.
| [reply] |
|
HOMEWORK!
CountZero "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law
| [reply] |
|
Re: removing C style comments from text files
by BUU (Prior) on Jan 06, 2004 at 08:42 UTC
|
Moving beyond the whole perl/regex thing, what about just running the c pre processor on it?
The only downside there I see is possible text matching #defines that you don't want to get replaced, but I would think matching what cpp considers a #define would be vastly simpler then matching comments.. | [reply] |
|
No, you don't want the preprocessor output. Consider the
following classical program:
# include <stdlib.h>
# include <stdio.h>
int main (int argc, char * argv []) {
printf ("Hello, world\n"); /* Print 'Hello, world' */
exit (0);
}
Assume this is in the file hello.c.
$ gcc -E hello.c | wc -l
1566
$ gcc -E hello.c | grep -v '^$' | wc -l
853
Even with blank lines removed, the 7 line hello.c
expands to 853 lines of pre processor output.
Abigail | [reply] [d/l] [select] |
|
Well yes, but it expands because you #include files. From his description (text files) I assumed that they weren't actual C programs and wouldn't be using the rest of the pre-processor commands. So there wouldn't be any #includes to expand the file size so dramatically. Beyond that I also suggested that it might be easier to match anything the pre-processor would consider a #define/#include, since the rules for that are fairly strict as I recall.
| [reply] |