Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Unable to match newline

by robinson (Novice)
on Nov 17, 2015 at 19:09 UTC ( [id://1147947]=perlquestion: print w/replies, xml ) Need Help??

robinson has asked for the wisdom of the Perl Monks concerning the following question:

Hello:

For a long time I was using grep with the option "-P" to turn on Perl pattern matching to detect if a file contained a given text. The pattern could be very simple or very complicated so I needed regular expressions.

Then I ran into some trouble where grep would complain that "PCRE's backtracking limit is exceeded". I could never find a working solution so I decided that instead of using grep to emulate Perl, I could just use a Perl one-liner and cut the middle man.

Now I'm running into some problems with Perl because for some reason it is refusing to match some files even thought everything seems correct and if I take the same expression and use it with 'grep -P', it has no trouble finding the match.

Here is a portion of my Perl one-liner (on linux). If there was a match, it would exit with code 0 (trying to emulate grep's behavior)

perl -ne "/(?s)<\?php.+?[\\$]{1}/ and exit 0; exit 1" filename.txt ; e +cho $?
Here is a partial content of the file "filename.txt":
<?php $t60="
Here is the same partial content in hex (because there are some not visible chars there):
0000000 3c 3f 70 68 70 20 20 0d 0a 24 74 36 30 3d 22 0a 0000020
For the life of me I can not find a way to make Perl to match that. If I use the same expression with Grep, it has no problem matching it:
grep -Pzo "(?s)<\?php.+?[\\$]{1}([0-9a-zA-Z]+)=['\"]+" filename.txt

What am I missing? I would really appreciate your help as every Regex program and editor and debugger says the regex is correct and like I said, grep has no problem at all matching it.

Replies are listed 'Best First'.
Re: Unable to match newline
by Corion (Patriarch) on Nov 17, 2015 at 19:14 UTC

    Your oneliner is reading one line, but your match target spreads across multiple lines. You could instead slurp the whole file into memory, or maybe just read the first two kilobyte of any file to match against it.

      Thanks for your reply. How can I do this in one line? (I mean without having to create a file with the code in it).

      I apologize if this seems like a very basic question but my knowledge of Perl is very low.

        perl -0777 -e 'somecode' will slurp the whole file when you read it with a diamond operator. See perlrun - but single-liner-ing your code isn't always a good thing for readability and maintainability

Re: Unable to match newline
by linuxer (Curate) on Nov 17, 2015 at 19:42 UTC
    Just some thoughts:

    If you want to compare Perl with grep you need to compare the same behaviour.

    Your grep uses -z ("Treat the input as a set of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline."). So it breaks the "classical" definition of a line.

    I do not see something similar in your perl command. So it uses the "classical" line definition. At the same time, I think, your Regex in the perl command would not match a newline character with the '.', because it does not use the /s modifier.

      Thank you very much for your reply. Isn't "(?s)" the same as "/s" ?

      How can I emulate the -z ("a data line ends in 0 byte, not newline") behavior of grep on Perl?

        Regarding the '(?s)': Good point. I must admit, I did not recognize (obviously) that. I'm not used to it. You're right.

        Regarding the "emulate the -z": Perl has a similar switch. Checkout perldoc perlrun and look for the first command switch description ;-)

        edit: s/command line/command/

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1147947]
Approved by stevieb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2024-04-24 22:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found