G'day lulz,
Welcome to the Monastery.
Reading an entire logfile into memory prior to processing would be very much the exception;
the norm would be to process the file a line at a time.
The format of each log entry is defined in the Apache configuration file (httpd.conf or whatever you've called it). From my httpd.conf, here's the lines that describe the access_log:
LogFormat "%h %l %u %t \"%r\" %>s %b" common
...
CustomLog "/private/var/log/apache2/access_log" common
See the documentation in Apache Module mod_log_config for a description of the %X codes and other related information.
With that information to hand, it's fairly easy to construct a regex to parse the log records.
Here's a script to do that.
The three DATA lines are taken verbatim from my access_log file.
#!/usr/bin/env perl
use strict;
use warnings;
# LogFormat "%h %l %u %t \"%r\" %>s %b" common
my $re = qr{
^
( \S+ ) # capture remote host (%h)
\s+
( \S+ ) # capture remote logname (%l)
\s+
( \S+ ) # capture remote user (%u)
\s+
\[
( [^\]]+ ) # capture request time (%t) without br
+ackets
\]
\s+
"
( (?: [^"\\]++ | \\. )*+ ) # capture first line of request (%r)
"
\s+
( \d+ ) # capture final status (%>s)
\s+
( \d+ ) # capture response size in bytes (%b)
$
}x;
my $format = join '',
"Host: %s\n",
"Logname: %s\n",
"User: %s\n",
"Time: %s\n",
"Request: %s\n",
"Status: %d\n",
"Size: %d\n\n";
printf $format, /$re/ while <DATA>;
__DATA__
127.0.0.1 - - [22/Apr/2015:13:35:04 +1000] "GET /bin/admin.pl HTTP/1.1
+" 401 509
127.0.0.1 - ken [22/Apr/2015:13:35:21 +1000] "GET /bin/admin.pl HTTP/1
+.1" 500 656
127.0.0.1 - - [24/Apr/2015:04:51:49 +1000] "GET / HTTP/1.1" 200 45
Output:
Host: 127.0.0.1
Logname: -
User: -
Time: 22/Apr/2015:13:35:04 +1000
Request: GET /bin/admin.pl HTTP/1.1
Status: 401
Size: 509
Host: 127.0.0.1
Logname: -
User: ken
Time: 22/Apr/2015:13:35:21 +1000
Request: GET /bin/admin.pl HTTP/1.1
Status: 500
Size: 656
Host: 127.0.0.1
Logname: -
User: -
Time: 24/Apr/2015:04:51:49 +1000
Request: GET / HTTP/1.1
Status: 200
Size: 45
Be aware that your configuration may use other logfiles with different LogFormat directives;
however, you should be able to contruct a suitable regex using the script above as a template.
And, of course, you'll probably want to do something more useful than just printing the data.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.