Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Match into list?

by cormanaz (Deacon)
on Sep 03, 2007 at 23:47 UTC ( [id://636801]=perlquestion: print w/replies, xml ) Need Help??

cormanaz has asked for the wisdom of the Perl Monks concerning the following question:

Good Day bros. I was hunting around for some resources to parse Apache log files, and I ran across this script which contains the following statement
my ($host,$date,$url_with_method,$status,$size,$referrer,$agent) = $ +line =~ m/^(\S+) - - \[(\S+ \+\d{4})\] "(\S+ \S+ [^"]+)" (\d{3}) (\d ++|-) "(.*?)" "([^"]+)"$/;
in an attempt to split apart a $line from the log file like
76.172.202.159 - - [31/Aug/2007:15:58:15 -0600] "GET / HTTP/1.1" 200 2 +9692 "http://www.paperbackswap.com/forum/view_topic.php?t=70235&ls=" +"Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.0.12) Gecko +/20070508 Firefox/1.5.0.12"
This didn't look right to me since the $line =~ m/.../ part would evaluate to 1 rather than a list. I tried it in the debugger sure enough it didn't work.

This got me to thinking about whether there is a way to make it work. I looks like intent of it is for the parens in the regexp to capture the material of interest, but this would wind up in the special vars $1, $2, etc., rather than a list. Does anyone know how this could be made to work?

Replies are listed 'Best First'.
Re: Match into list?
by Sidhekin (Priest) on Sep 04, 2007 at 00:00 UTC

    In list context, the m// operator (without /g modifier) does indeed return a list of all captures. So that's not your problem.

    One problem with that pattern is that it's insisting on a "+" in the time zone specification. The string you're trying to match has a "-" ... ;-)

    Use [+-] instead of \+, and it'll match that string at least (tested).

    print "Just another Perl ${\(trickster and hacker)},"
    The Sidhekin proves Sidhe did it!

Re: Match into list?
by ysth (Canon) on Sep 03, 2007 at 23:54 UTC
      Taking the debugger out of it, when I run
      #!/usr/bin/perl -w use strict; my $line = '76.172.202.159 - - [31/Aug/2007:15:58:15 -0600] "GET / HTT +P/1.1" 200 29692 "http://www.paperbackswap.com/forum/view_topic.php?t +=70235&ls=" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8 +.0.12) Gecko/20070508 Firefox/1.5.0.12"'; my ($host,$date,$url_with_method,$status,$size,$referrer,$agent) = $li +ne =~ m/^(\S+) - - \[(\S+ \+\d{4})\] "(\S+ \S+ [^"]+)" (\d{3}) (\d+|-) " +(.*?)" "([^"]+)"$/; print "$host\n$date\n$url_with_method\n$status\n$size\n$referrer\n$age +nt\n";
      The output is
      Use of uninitialized value in concatenation (.) or string at test.pl l +ine 6. Use of uninitialized value in concatenation (.) or string at test.pl l +ine 6. Use of uninitialized value in concatenation (.) or string at test.pl l +ine 6. Use of uninitialized value in concatenation (.) or string at test.pl l +ine 6. Use of uninitialized value in concatenation (.) or string at test.pl l +ine 6. Use of uninitialized value in concatenation (.) or string at test.pl l +ine 6. Use of uninitialized value in concatenation (.) or string at test.pl l +ine 6.
        Your regex only works with positive tz offsets, not negative :)

        Remember to always check if your match succeeds before trying to use the results; in list context, this looks like:

        #!/usr/bin/perl -w use strict; my $line = '76.172.202.159 - - [31/Aug/2007:15:58:15 -0600] "GET / HTT +P/1.1" 200 29692 "http://www.paperbackswap.com/forum/view_topic.php?t +=70235&ls=" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8 +.0.12) Gecko/20070508 Firefox/1.5.0.12"'; my ($host,$date,$url_with_method,$status,$size,$referrer,$agent) = $li +ne =~ m/^(\S+) - - \[(\S+ [-+]\d{4})\] "(\S+ \S+ [^"]+)" (\d{3}) (\d+|-) + "(.*?)" "([^"]+)"$/ or warn "match failed!"; print "$host\n$date\n$url_with_method\n$status\n$size\n$referrer\n$age +nt\n";
        or:
        #!/usr/bin/perl -w use strict; my $line = '76.172.202.159 - - [31/Aug/2007:15:58:15 -0600] "GET / HTT +P/1.1" 200 29692 "http://www.paperbackswap.com/forum/view_topic.php?t +=70235&ls=" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8 +.0.12) Gecko/20070508 Firefox/1.5.0.12"'; if ( my ($host,$date,$url_with_method,$status,$size,$referrer,$agent) += $line =~ m/^(\S+) - - \[(\S+ [-+]\d{4})\] "(\S+ \S+ [^"]+)" (\d{3}) (\d+|-) + "(.*?)" "([^"]+)"$/ ) { print "$host\n$date\n$url_with_method\n$status\n$size\n$referrer\n +$agent\n"; } else { warn "match failed!"; }
        You might take a look at the Regexp::Log module.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://636801]
Approved by Sidhekin
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2024-04-19 21:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found