Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Your lines have four non-space items followed by a series of this=that pairs where that could contain spaces. I would first split on whitespace using the third argument to limit the split to five fields. I would then use a global regex match to pull out the thises and thats from the fifth field as key/value pairs to populate a hash. The regex uses a look-ahead to avoid consuming the next pair. I use Data::Dumper here to show what has been parsed from the file.

use strict; use warnings; use Data::Dumper; my $rxExtractFields = qr {(?x) \s* (\S+) = \s* (\S.*?) (?= \s*\S+= | \z ) }; open my $inFH, q{<}, \ <<'END_OF_FILE' or die qq{open: $!\n}; 2007-11-16 16:04:33 Local1.Alert 128.29.29.40 id=firewall tim +e="2007-11-16 16:04:08" fw=WS2000-Store 29 pri=1 proto=6(tcp) src=128 +.29.29.200 dst=128.29.100.102 mid= 1013 mtp= 2 msg=TCP connection re +quest received is invalid, dropping packet Src 23 Dst 4412 from EXT n +/w agent=Firewall 2007-11-16 16:05:05 Local1.Alert 128.24.24.40 id=firewall tim +e="2007-11-16 16:03:25" fw=WS2000-Store 24 pri=1 proto=6(tcp) src=128 +.24.24.200 dst=128.24.100.101 mid= 1013 mtp= 2 msg=TCP connection re +quest received is invalid, dropping packet Src 23 Dst 4344 from EXT n +/w agent=Firewall 2007-11-16 16:05:34 Local1.Alert 128.29.29.40 id=firewall tim +e="2007-11-16 16:05:09" fw=WS2000-Store 29 pri=1 proto=6(tcp) src=128 +.29.29.200 dst=128.29.100.102 mid= 1013 mtp= 2 msg=TCP connection re +quest received is invalid, dropping packet Src 23 Dst 4412 from EXT n +/w agent=Firewall 2007-11-16 16:05:39 Local1.Alert 128.2.2.40 id=firewall time= +"2007-11-16 16:03:36" fw=WS2000-Store 02 pri=1 proto=6(tcp) src=128.2 +.2.200 dst=128.2.100.106 mid= 1013 mtp= 2 msg=TCP connection request + received is invalid, dropping packet Src 23 Dst 4631 from EXT n/w ag +ent=Firewall 2007-11-16 16:05:40 Local1.Alert 128.2.2.40 id=firewall time= +"2007-11-16 16:03:36" fw=WS2000-Store 02 pri=1 proto=6(tcp) src=128.2 +.2.200 dst=128.2.100.106 mid= 1013 mtp= 2 msg=TCP connection request + received is invalid, dropping packet Src 23 Dst 4631 from EXT n/w ag +ent=Firewall 2007-11-16 16:05:40 Local1.Alert 128.2.2.40 id=firewall time= +"2007-11-16 16:03:37" fw=WS2000-Store 02 pri=1 proto=6(tcp) src=128.2 +.2.200 dst=128.2.100.106 mid= 1013 mtp= 2 msg=TCP connection request + received is invalid, dropping packet Src 23 Dst 4631 from EXT n/w ag +ent=Firewall END_OF_FILE my @parsedData = (); while ( <$inFH> ) { chomp; my ( $date, $time, $type, $ip, $restOfLine ) = split m{\s+}, $_, 5; my %pairs = $restOfLine =~ m{$rxExtractFields}g; push @parsedData, { field1 => $date, field2 => $time, field3 => $type, field4 => $ip, %pairs, }; } close $inFH or die qq{close: $!\n}; print Data::Dumper->Dumpxs( [ \ @parsedData], [ q{*parsedData} ] );

Here's the output.

@parsedData = ( { 'msg' => 'TCP connection request received is invalid +, dropping packet Src 23 Dst 4412 from EXT n/w', 'proto' => '6(tcp)', 'time' => '"2007-11-16 16:04:08"', 'src' => '128.29.29.200', 'field4' => '128.29.29.40', 'field2' => '16:04:33', 'field3' => 'Local1.Alert', 'mtp' => '2', 'mid' => '1013', 'fw' => 'WS2000-Store 29', 'field1' => '2007-11-16', 'agent' => 'Firewall', 'pri' => '1', 'id' => 'firewall', 'dst' => '128.29.100.102' }, { 'msg' => 'TCP connection request received is invalid +, dropping packet Src 23 Dst 4344 from EXT n/w', 'proto' => '6(tcp)', 'time' => '"2007-11-16 16:03:25"', 'src' => '128.24.24.200', 'field4' => '128.24.24.40', 'field2' => '16:05:05', 'field3' => 'Local1.Alert', 'mtp' => '2', 'fw' => 'WS2000-Store 24', 'mid' => '1013', 'field1' => '2007-11-16', 'agent' => 'Firewall', 'id' => 'firewall', 'pri' => '1', 'dst' => '128.24.100.101' }, { 'msg' => 'TCP connection request received is invalid +, dropping packet Src 23 Dst 4412 from EXT n/w', 'proto' => '6(tcp)', 'time' => '"2007-11-16 16:05:09"', 'src' => '128.29.29.200', 'field4' => '128.29.29.40', 'field2' => '16:05:34', 'field3' => 'Local1.Alert', 'mtp' => '2', 'fw' => 'WS2000-Store 29', 'mid' => '1013', 'field1' => '2007-11-16', 'agent' => 'Firewall', 'id' => 'firewall', 'pri' => '1', 'dst' => '128.29.100.102' }, { 'msg' => 'TCP connection request received is invalid +, dropping packet Src 23 Dst 4631 from EXT n/w', 'proto' => '6(tcp)', 'time' => '"2007-11-16 16:03:36"', 'src' => '128.2.2.200', 'field4' => '128.2.2.40', 'field2' => '16:05:39', 'field3' => 'Local1.Alert', 'mtp' => '2', 'fw' => 'WS2000-Store 02', 'mid' => '1013', 'field1' => '2007-11-16', 'agent' => 'Firewall', 'id' => 'firewall', 'pri' => '1', 'dst' => '128.2.100.106' }, { 'msg' => 'TCP connection request received is invalid +, dropping packet Src 23 Dst 4631 from EXT n/w', 'proto' => '6(tcp)', 'time' => '"2007-11-16 16:03:36"', 'src' => '128.2.2.200', 'field4' => '128.2.2.40', 'field2' => '16:05:40', 'field3' => 'Local1.Alert', 'mtp' => '2', 'fw' => 'WS2000-Store 02', 'mid' => '1013', 'field1' => '2007-11-16', 'agent' => 'Firewall', 'id' => 'firewall', 'pri' => '1', 'dst' => '128.2.100.106' }, { 'msg' => 'TCP connection request received is invalid +, dropping packet Src 23 Dst 4631 from EXT n/w', 'proto' => '6(tcp)', 'time' => '"2007-11-16 16:03:37"', 'src' => '128.2.2.200', 'field4' => '128.2.2.40', 'field2' => '16:05:40', 'field3' => 'Local1.Alert', 'mtp' => '2', 'fw' => 'WS2000-Store 02', 'mid' => '1013', 'field1' => '2007-11-16', 'agent' => 'Firewall', 'id' => 'firewall', 'pri' => '1', 'dst' => '128.2.100.106' } );

I hope this is of interest.

Cheers,

JohnGG


In reply to Re: Parsing a log file by johngg
in thread Parsing a log file by TStanley

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (4)
As of 2024-04-24 01:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found