http://qs321.pair.com?node_id=568532

anjiro has asked for the wisdom of the Perl Monks concerning the following question:

I've been trying to modify the classic perlfaq 6 regex for removing C-style comments. For reference, that regex is:
s#/\*[^*]*\*+([^/*][^*]*\*+)*/|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/"' +\\]*)#defined $2 ? $2 : ""#gse;
The regex as is removes comments but preserves newlines, so if you have:
/* I have good commenting style */ i = 1; /* And I comment every line of code */ i++; /* Even if it's pointless */ j = i; j++;
you get:
i = 1; i++; j = i; j++;
when really I want:
i = 1; i++; j = i; j++;
I tried modifying the regex to add a \n, which almost works (just before the first |):
s#/\*[^*]*\*+([^/*][^*]*\*+)*/\n|("(\\.|[^"\\])*"|'(\\.|[^'\\])*'|.[^/ +"'\\]*)#defined $2 ? $2 : ""#gse;
but when comments are indented it doesn't kill the whitespace at the start of the line:
void function foo(void) { /*Indented comment*/ i = 1; }
Result:
void function foo(void) { i = 1; }
I've tried adding a \s* at the front of the regex, but that doesn't seem to catch it. I've also tried some more complicated things without luck. Any regex wizard monks out there care to give it a try? Here's a final piece of commented code with the desired result:
/*This is a bogus function*/ void function foo(void) /*My function is the best*/ { int i; /*i is an integer*/ int j; /*j is also an integer*/ /*Now I'm going to set i to 1*/ i = 1; /*Also j*/ j = 1; /*Here's some incrementing!*/ i++; /*And more!*/ j++; /*The end!*/}
Desired result:
void function foo(void) { int i; int j; i = 1; j = 1; i++; j++; }

Replies are listed 'Best First'.
Re: Modify C comment removal code to kill newlines
by marto (Cardinal) on Aug 21, 2006 at 13:21 UTC
Re: Modify C comment removal code to kill newlines
by borisz (Canon) on Aug 21, 2006 at 14:03 UTC
    Hi, here is another solution.
    use Regexp::Common qw/comment/; local $/; $_ = <DATA>; s/$RE{comment}{C}\n*//gm; print; __DATA__ /* I have good commenting style */ i = 1; /* And I comment every line of code */ i++; /* Even if it's pointless */ j = i; j++;
    Boris
      Nice try, but on the original example this gives:
      void function foo(void) { int i; int j; i = 1; j = 1; i++; j++; }
        Ok, try this. It looks perfect to me on UNIX.
        use Regexp::Common qw/comment/; local $/; $_ = <DATA>; s/(^[ \t]*$RE{comment}{C}\n|$RE{comment}{C})//gm; print; __DATA__ /*This is a bogus function*/ void function foo(void) /*My function is the best*/ { int i; /*i is an integer*/ int j; /*j is also an integer*/ /*Now I'm going to set i to 1*/ i = 1; /*Also j*/ j = 1; /*Here's some incrementing!*/ i++; /*And more!*/ j++; /*The end!*/} __OUTPUT__ void function foo(void) { int i; int j; i = 1; j = 1; i++; j++; }
        Boris
Re: Modify C comment removal code to kill newlines
by Velaki (Chaplain) on Aug 21, 2006 at 13:48 UTC

    Here's a little snippet that produces the output you desire. However, it doesn't really take into account nested comments or multiline comments, but you can extend that on your own.

    By the way, your last line of sample code /*The end!/*} is not a valid comment terminator. I think you meant /*The end!*/}.

    #!/usr/bin/perl use strict; use warnings; undef $/; my $code = <DATA>; # $code =~ s|^\s*/\*.*?\*/||gm; # Changed as per anjiro's comments $code =~ s#^\s*/\*.*?\*/##gm; $code =~ s#(?<=\S)\s/\*.*?\*/\s*?$##gm; $code =~ s#\n##ms; print $code; __DATA__ /*This is a bogus function*/ void function foo(void) /*My function is the best*/ { int i; /*i is an integer*/ int j; /*j is also an integer*/ /*Now I'm going to set i to 1*/ i = 1; /*Also j*/ j = 1; /*Here's some incrementing!*/ i++; /*And more!*/ j++; /*The end!*/}

    Hope this helped,
    -v.

    "Perl. There is no substitute."

    Update: Changed the regexes to reflect anjiro's comments.

      It also doesn't kill the end-of-line comments:
      void function foo(void) /*My function is the best*/ { int i; /*i is an integer*/ int j; /*j is also an integer*/ i = 1; j = 1; i++; j++; }
      And you're right about "The end!/*" - I'll fix that now.
Re: Modify C comment removal code to kill newlines (non-nl-ws)
by tye (Sage) on Aug 21, 2006 at 14:02 UTC