Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Matching lines in a file that end in numbers (was: regular expression)

by ashok (Sexton)
on Feb 28, 2001 at 09:45 UTC ( [id://61292]=perlquestion: print w/replies, xml ) Need Help??

ashok has asked for the wisdom of the Perl Monks concerning the following question:

Edited by mirod: changed the title as suggested.

Hi,
I am trying to print the source line in C/C++ which ends with some numbers. I am receiving C/C++ files ending with some numbers. I can not compile those source. For ex.

/* file name is endno1.c */ * should*/ int i; 10002 /*should not*/ int j; /*should not*/ int k; /* should*/ int m; 20002
Another file endno2.c
int i; int j; /* 10th line = 20 */ int k; /*should*/ int m; 20005
So I am trying to print those lines ending with nos. My code should be in single line. So I came out on unix machine like this:
find . -name "*.c" -exec cat {} \; |perl -ne 'print $1\n" if $_ =~ /(. +*)\d+$/'
but it prints like this
/*should*/ int m; 2000 /* should*/ int i; 1000 /* should*/ int m; 2000
That is it matches entire line except the last character. My intention is to not to print those nos in the end. Then I have modified my code like this
find . -name "*.c" -exec cat {} \; |perl -ne 'print $1\n" if $_ =~ /(. +*)(.*?(?=\d))$/'
But it does not print anything. If I remove the $ in the end, then it prints this line also which I do not want.
/* 10th line = 2 */
Can you pl. correct my regular expression. Again I want it to be in single line. My intention is to compile the code cleanly in C/C++ without those nos in the end. I am redirecting the output to a file and compile then. Thanks Ashok

Replies are listed 'Best First'.
Re: regular expression
by archon (Monk) on Feb 28, 2001 at 10:29 UTC
    Your regular expression saves everything in the first set of parenthesis. Your first regex does /(.*)\d+$/. Since .* is greedy, that grabs everything up to the last number and puts it in $1.

    I don't understand from your question what you're trying to print, exactly. If you don't want to print any of the numbers, use: /(.*?)\d+/. That will make the first part of your regular expression non-greedy. This means it will match as little as possible before stopping.

    If that's not what you're looking for, please clarify.

    HTH HAND

      Yes, I do not want to print any number appearing after regular code. I agree with your code and looking exactly for. I totally forgot about ?. After coming home from office I did figure out in anotherway.
      /(.*)([^\d].*)$/
      I found from the C source code that I am compiling will not having any code after the numbers in the end and the above reg. exp. is safe. But I should admit that your approach is elegant. Thanks again Ashok
        What I would suggest is:     print "$1\n" if (/^(.*?)\s*\d+$/); This will have the effect of removing the trailing space from the line.

        I'm not sure if this will be an issue for your application or not, but your program might get a bit carried away if you (or someone else) spaced out some of your expressions, such as:
        x = ReallyLongFunctionNameNumber1() + 130 + AnotherCrazyLongFunctionNameThatIsOnItsOwnLine();
        I would presume that you don't want the program to pick up on that line, in which case, a slightly safer version might be to put a semi-colon in your regexp, like so:     print "$1\n" if (/^(.*?;)\s*\d+$/); Or to demand a specific quantity of numbers, such as 5:     print "$1\n" if (/^(.*?)\s*\d{5}$/); Although, to be truly "safe", you would want to change the format of your numerical markup system slightly, such as turning it into a comment, like:     int i; // #20005 Where '// #20005' is not very likely to show up anywhere else in your code.

        Though, of course, this will depend on the context of your application, and it might be way over-kill.
Re: regular expression
by ariels (Curate) on Feb 28, 2001 at 13:38 UTC
    Here's another way to do it: perl -ne 'print if s/\d+$//'.

    Why search when you can replace?

    Note that that could leave trailing whitespace; to get rid of that, use s/\s*\d+$// instead.

Re: regular expression
by mirod (Canon) on Feb 28, 2001 at 12:38 UTC

    As you want to output the whole line there is really no need to capture anything with brackets. Just print the entire line if it ends with a nuber:

    perl -ne 'print if /\d$/'

    print will print $_ by default
    /\d$/ will matche if $_ ends with a number (in good Perl golf spirit there is no need to use \d+ there: 1 digit is enough and we don't want to capture the number as we output the entire line).

    An other slightly more cryptic way of writing it would be perl -ne '/\d$/ && print' but it's the same length, so no bonus point in Perl golf.

Re: regular expression
by pileswasp (Monk) on Feb 28, 2001 at 15:45 UTC
    If you want to compile the output then surely you're going to need all of the lines, not just the ones tbat started off ending with numbers.

    If that's the case I think what you want is:
    find . -name "*.c" -exec perl -ne 's/\d+$//; print' {} \;
    which strips any numbers from the end of the lines and prints them all.

Re: regular expression
by McD (Chaplain) on Mar 01, 2001 at 00:00 UTC
    Hmmm. Several other monks, wiser than I, have observed that it's very difficult to tell if the trailing number you're stripping is a "good number" (i.e., line number) or a "bad number" to strip. For example,

    int i = 100
                + offset;
    Some folks have offered the heuristic of looking for a trailing semicolon, etc.

    Just wondering - if these really are line numbers, can you make use of $.?

    perl -MEnglish -ne 's/$INPUT_LINE_NUMBER$//; print;'

    ...or something like that.

    Sounds like a thorny problem with a large, not-well-known dataset. The only cure for that I know if is to carefully hand-check the result of your output. Good luck.

    Peace,
    -McD

Re: regular expression
by Anonymous Monk on Feb 28, 2001 at 23:16 UTC
    This is the treasure of this web site. I really grateful to all your's help. I came across good knowledgeble hints. I am seeing such type of code from IBM platform. Only some source code files having such problem ending with line numbers. I unable to apply this technique on every file because of the following. For ex:
    /* Developed in 1997 by some xxxx */ #define m 100 100020 #define n 2000 100040
    The logic will eat 1997 & 100 & 2000 in the above code. I unable to come with a regular expression in a single line. But alleast it tells me which programs having such problem to note down. I am editing those programs in a text editor using vertical block mode editing and deleting those numbers before compiling. I thought it works with a single line regular expression. But it looks I need to go for a script to add more conditions. I appreciate your help. Thanks Ashok

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://61292]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (6)
As of 2024-03-28 11:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found