I implemented a quick debug option that spits non-matches out to STDERR. In testing I found a pattern bug with byte counts of 304 log entries. Both are fixed in the following diff:
26c26
< GetOptions (\%optctl, "type|t=s", "pattern|p=s");
---
> GetOptions (\%optctl, "type|t=s", "pattern|p=s", "debug|d=i");
30,32c30,32
< 'common' => [ qr{(\S+) (\S+) (\S+) \[([^\]]*)\] \"([^\"]*)\" (\d
++) (\d+)}, [qw(h l u t r c b)] ],
< 'virtual' => [ qr{(\S+) (\S+) (\S+) (\S+) \[([^\]]*)\] \"([^\"]*)
+\" (\d+) (\d+)}, [qw(v h l u t r c b)] ],
< 'combined' => [ qr{(\S+) (\S+) (\S+) \[([^\]]*)\] \"([^\"]*)\" (\d
++) (\d+) \"([^\"]*)\" \"([^\"]*)\"}, [qw(h l u t r c b R A)] ],
---
> 'common' => [ qr{(\S+) (\S+) (\S+) \[([^\]]*)\] \"([^\"]*)\" (\d
++) ([\d\-]+)}, [qw(h l u t r c b)] ],
> 'virtual' => [ qr{(\S+) (\S+) (\S+) (\S+) \[([^\]]*)\] \"([^\"]*)
+\" (\d+) ([\d\-]+)}, [qw(v h l u t r c b)] ],
> 'combined' => [ qr{(\S+) (\S+) (\S+) \[([^\]]*)\] \"([^\"]*)\" (\d
++) ([\d\-]+) \"([^\"]*)\" \"([^\"]*)\"}, [qw(h l u t r c b R A)] ],
35,36c35,36
< 'extended' => [ qr{(\S+) (\S+) (\S+) \[([^\]]*)\] \"([^\"]*)\" (\d
++) (\d+) \"([^\"]*)\" \"([^\"]*)\" (\d+) (\d+)}, [qw(h l u t r c b R
+A P T)] ],
< 'custom' => [ qr{(\S+) (\S+) (\S+) \[([^\]]*)\] \"([^\"]*)\" (\d
++) (\d+) \"([^\"]*)\" \"([^\"]*)\" (\d+)}, [qw(h l u t r c b A R T)]
+],
---
> 'extended' => [ qr{(\S+) (\S+) (\S+) \[([^\]]*)\] \"([^\"]*)\" (\d
++) ([\d\-]+) \"([^\"]*)\" \"([^\"]*)\" (\d+) (\d+)}, [qw(h l u t r c
+b R A P T)] ],
> 'custom' => [ qr{(\S+) (\S+) (\S+) \[([^\]]*)\] \"([^\"]*)\" (\d
++) ([\d\-]+) \"([^\"]*)\" \"([^\"]*)\" (\d+)}, [qw(h l u t r c b A R
+T)] ],
102a103,104
> } elsif ($optctl{debug} == 1) {
> print STDERR $_;
With the new patterns, a quick match against 79154 lines from an access log of 'extended' format had 8 lines which didn't match. All of them were because of quotes in the request or the user agent strings.
Here's a user agent that didn't match...
"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Q312461; <HTML><A%
+20HREF="http://www.pghconnect.com/">www.pghconnect.com</a></HTML>)"