Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Re: Re: Re: Multi-Format Log Parser - Version 2.0

by cjensen (Sexton)
on Jan 23, 2002 at 06:08 UTC ( [id://140783]=note: print w/replies, xml ) Need Help??


in reply to Re: Re: Multi-Format Log Parser - Version 2.0
in thread Multi-Format Log Parser - Version 2.0

I implemented a quick debug option that spits non-matches out to STDERR. In testing I found a pattern bug with byte counts of 304 log entries. Both are fixed in the following diff:
26c26 < GetOptions (\%optctl, "type|t=s", "pattern|p=s"); --- > GetOptions (\%optctl, "type|t=s", "pattern|p=s", "debug|d=i"); 30,32c30,32 < 'common' => [ qr{(\S+) (\S+) (\S+) \[([^\]]*)\] \"([^\"]*)\" (\d ++) (\d+)}, [qw(h l u t r c b)] ], < 'virtual' => [ qr{(\S+) (\S+) (\S+) (\S+) \[([^\]]*)\] \"([^\"]*) +\" (\d+) (\d+)}, [qw(v h l u t r c b)] ], < 'combined' => [ qr{(\S+) (\S+) (\S+) \[([^\]]*)\] \"([^\"]*)\" (\d ++) (\d+) \"([^\"]*)\" \"([^\"]*)\"}, [qw(h l u t r c b R A)] ], --- > 'common' => [ qr{(\S+) (\S+) (\S+) \[([^\]]*)\] \"([^\"]*)\" (\d ++) ([\d\-]+)}, [qw(h l u t r c b)] ], > 'virtual' => [ qr{(\S+) (\S+) (\S+) (\S+) \[([^\]]*)\] \"([^\"]*) +\" (\d+) ([\d\-]+)}, [qw(v h l u t r c b)] ], > 'combined' => [ qr{(\S+) (\S+) (\S+) \[([^\]]*)\] \"([^\"]*)\" (\d ++) ([\d\-]+) \"([^\"]*)\" \"([^\"]*)\"}, [qw(h l u t r c b R A)] ], 35,36c35,36 < 'extended' => [ qr{(\S+) (\S+) (\S+) \[([^\]]*)\] \"([^\"]*)\" (\d ++) (\d+) \"([^\"]*)\" \"([^\"]*)\" (\d+) (\d+)}, [qw(h l u t r c b R +A P T)] ], < 'custom' => [ qr{(\S+) (\S+) (\S+) \[([^\]]*)\] \"([^\"]*)\" (\d ++) (\d+) \"([^\"]*)\" \"([^\"]*)\" (\d+)}, [qw(h l u t r c b A R T)] +], --- > 'extended' => [ qr{(\S+) (\S+) (\S+) \[([^\]]*)\] \"([^\"]*)\" (\d ++) ([\d\-]+) \"([^\"]*)\" \"([^\"]*)\" (\d+) (\d+)}, [qw(h l u t r c +b R A P T)] ], > 'custom' => [ qr{(\S+) (\S+) (\S+) \[([^\]]*)\] \"([^\"]*)\" (\d ++) ([\d\-]+) \"([^\"]*)\" \"([^\"]*)\" (\d+)}, [qw(h l u t r c b A R +T)] ], 102a103,104 > } elsif ($optctl{debug} == 1) { > print STDERR $_;

With the new patterns, a quick match against 79154 lines from an access log of 'extended' format had 8 lines which didn't match. All of them were because of quotes in the request or the user agent strings.

Here's a user agent that didn't match...
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Q312461; <HTML><A% +20HREF="http://www.pghconnect.com/">www.pghconnect.com</a></HTML>)"

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://140783]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2024-03-28 21:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found