Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Bad Regex

by nofernandes (Beadle)
on Jun 23, 2003 at 17:27 UTC ( [id://268245]=perlquestion: print w/replies, xml ) Need Help??

nofernandes has asked for the wisdom of the Perl Monks concerning the following question:

Hello again!!

I have this regex that tries to get the lines of a program with comments..

sub extract_comments { my $filename = shift; local (*F, $/); open F, "< $filename" or warn("can't read $filename: $!"), return; @hello1=grep defined, <F> =~ m{ ( \# .*? \n ) # extract a comment starting in # #and finishing in an enter | " (?: [^"\#]* | \#. )* " # skip over "..." | ' (?: [^'\#]* | \#. )* ' # skip over '...' | . [^\#"']* # skip over non-comments-or-quotes }xgs; $tam=@h...; print "Tam: $tam\n"; return @hello; } $file="ex1.pl"; @ola=extract_comments($file); foreach $line (@ola){ print"$line\n"; }

But the problem is that this doesn´t work very well!! Can somebody tell me why?

Another thing is how can i catch the line number when the regex has match??

Thank you all!

Replies are listed 'Best First'.
Re: Bad Regex
by Theo (Priest) on Jun 23, 2003 at 18:45 UTC
    Regex Coach may help you with this and other regex problems.

    -ted-

    Update: I found this link in a post by ralphie.

Re: Bad Regex
by RollyGuy (Chaplain) on Jun 23, 2003 at 18:02 UTC
    You should try Regexp::Common::comment for common regexes that match comments. It will probably be better than rolling your own.

    As for the line count, you can simply keep a counter as you go through the lines of the file. So, set a variable outside of your foreach loop and increment each time to get the new line number.

      Eh, no, not really. The OP seems to want to find comments inside a program. Regexp::Common just has regular expressions for comments - for Perl comments, it would find any text starting with a pound sign, and ending with the next newline. It's not a full language parser, something you really need if you want to find comments inside Perl programs. You might be able to get away skipping over anything inside quotes in a C or Java program, but that isn't true for Perl. You really need to parse (or be Damian).

      Abigail

Re: Bad Regex
by Popcorn Dave (Abbot) on Jun 23, 2003 at 18:58 UTC
    You might also consider using YAPE::Regex::Explain to see exactly what you've written. I just used it today and it's nice to get an explanation on what I've done for a regex as opposed as to what I *thought* I did. :)

    There is no emoticon for what I'm feeling now.

Re: Bad Regex
by Not_a_Number (Prior) on Jun 23, 2003 at 19:37 UTC

    ++ to the above suggestions, apart from the irrelevant anonymous reply.

    More generally, however, apart from presentation problems, you should heed the advice constantly repeated on this site by Monks far more experienced than you (or me :-). _Always_:

    use strict; use warnings;

    Adding these two lines at the beginning of your code as it stands will cause it to fail, printing out a plethora of warnings on the lines of:

    Global symbol "@hello1" requires explicit package name at (prog_name) line 11

    Work through these, one or two at a time, by declaring the variables in question with my.

    A bit painstaking, but less so than posting a question to Perlmonks :-)

    And suddenly you'll find why you should have started out by using strict and warnings; that is, when you get to the end of your sub and see that you try to return an array that doesn't exist!

    (I assume that your line $tam=@h...; is just a copy/paste issue).

    This won't solve your 'bad regex' problem, but it will solve others that you have, as well as others that you could certainly encounter in the future!

    HTH

    dave

      I fail to see the irrelevance. OP asked about capturing line number, and a counter variable is a tad clunky. Using $. is a very simple way of doing it, and the $.->$_ scheme is an easy way to keep track of it all in an intuitive way. It is true that the regex may be m/^#/ is overly simple (then again, perhaps not...), but that can be fleshed out; all in all this seems a simple, pure-perl way to do the job.
Re: Bad Regex
by graff (Chancellor) on Jun 24, 2003 at 02:32 UTC
    Here's a different sort of attack you could take on this problem, assuming that you are only looking to count comments in perl code -- it involves two steps:
    1. Use the B::Deparse module to "canonicalize" a perl script -- this includes removing all comments
    2. Find the differences between the original and deparsed versions of the script, and isolate the diffs that involve comments in the original
    This could be done with the following unix-style pipeline command (and I think you should be able to see how this could be formalized as a perl script):
    perl -MO=Deparse some_script.pl | diff - some_script.pl | perl -ne 'print if /^> / and /[^\$]#/'
    Note that the first step only works if the script does not contain syntax errors (it has to compile properly in order to be deparsed).

    Of course, this is still not perfect -- the "deparsed" output may produce a lot of minor, irrelevant differences relative to the original script, and some of these might happen to contain "#" characters that are part of the code (not commentary); also, there may still be some challenges involved with trimming the final output down to just the commentary, if that is what you want to do. But this might help to eliminate some of the complexity, and may lead to some useful ideas.

    Of course, another issue is that this does not find the pod-format documentation, if any happens to exist. But to extract that, you just need to run "perldoc some_script.pl".

Re: Bad Regex
by Anonymous Monk on Jun 23, 2003 at 19:04 UTC
    my %comments; #comments keyed by line number open FH, "yourFile" or die $!; while (<FH>) { $comments{$.} = $_ if m/^#/ } close FH;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://268245]
Approved by hardburn
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (9)
As of 2024-03-28 09:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found