Cisco Log Files: broken REGEX


Pathologically Eclectic Rubbish Lister
	PerlMonks

Cisco Log Files: broken REGEX

by blue_cowdawg (Monsignor)

on Aug 21, 2003 at 23:23 UTC ( [id://285616]=perlquestion: print w/replies, xml )

Need Help??

blue_cowdawg has asked for the wisdom of the Perl Monks concerning the following question:

Given the following sample log file line:

Aug 21 19:00:36 [1.1.1.3.200.125] 410381: Aug 21 23:00:35 UTC: %SEC-6-
+IPACCESSLOGP: list 101 denied tcp 10.161.24.153(3988) -> 10.158.24.10
+(135), 1 packet
Aug 21 19:00:36 [1.1.1.3.200.125] 410382: Aug 21 23:00:35 UTC: %SEC-6-
+IPACCESSLOGDP: list 101 denied icmp 10.165.4.150 -> 211.95.79.233 (8/
+0), 1 packet
[download]

I am trying to campture the information in the lines to check for possible virus infestations. I tried using the regex

 m@^([A-Z][a-z]+\s+\d+\s+\d+\:\d+\:\d+)\s+([\.\d]+)\s+(\d+)\:\s+([A-Z]
+[a-z]+\s+\d+\s+\d+\:\d+\:\d+)\s+([A-Z]{3})\:\s+\%SEC\-6\-[A-Z]+\:\s+l
+ist\s+\d+([a-z]+)\s+([a-z]+)\s+(\d+\.\d+\.\d+\.\d+)\s+\-\>\s+(\d+\.\d
++\.\d+\.\d+)\s+\(\d+\/\d+\)\,\s+(\d)\s+packet$@
[download]

I know I am going brain dead right now, but can anybody spot anything glaringly obvious with this that is wrong?

Peter @ Berghold . Net

Sieze the cow! Bite the day!

Nobody expects the Perl inquisition!

Test the code? We don't need to test no stinkin' code!
All code posted here is as is where is unless otherwise stated.

Brewer of Belgian style Ales

Comment on Cisco Log Files: broken REGEX Select or Download Code

Replies are listed 'Best First'.
Re: Cisco Log Files: broken REGEX by chromatic (Archbishop) on Aug 21, 2003 at 23:41 UTC
What's glaringly obvious is that it could be more maintainable. Breaking it into separate parts could help. `my $timestamp = qr/[A-Z][a-z]+\s+\d+\s+\d+\:\d+\:\d+/; my $address = qr/[\.\d]+/; my $id = qr/\d+/; my $timezone = qr/[A-Z]+:/; # and so on` [download]	[reply] [d/l]
Re: Cisco Log Files: broken REGEX by chunlou (Curate) on Aug 22, 2003 at 00:03 UTC
It doesn't hurt to split up your regex for readability. $_='Aug 21 19:00:36 [1.1.1.3.200.125] 410381: Aug 21 23:00:35 UTC: %SE +C-6-IPACCESSLOGP: list 101 denied tcp 10.161.24.153(3988) -> 10.158.2 +4.10(135), 1 packet'; / ([A-Z][a-z]+\s+\d+\s+\d+\:\d+\:\d+) # Aug 21 19:00:36 \s+ (\[\d+\.\d+\.\d+\.\d+\.\d+\.\d+\]) # [1.1.1.3.200.125] \s+ (\d+:) # 410381: \s+ ([A-Z][a-z]+\s+\d+\s+\d+\:\d+\:\d+) # Aug 21 23:00:35 \s+ ([A-Z]{3}:) # UTC: \s+ (\%SEC-\d-\w+?:) # %SEC-6-IPACCESSLOGP: \s+ (list\s\d+\s.?) # list 101 denied tcp \s+ (\d+\.\d+\.\d+\.\d+\(\d+\)) # 10.161.24.153(3988) \s+->\s (\d+\.\d+\.\d+\.\d+\(\d+\)) # 10.158.24.10(135) \s,\s+ (.*) # 1 packet /x; print "$1\n$2\n$3\n$4\n$5\n$6\n$7\n$8\n$9\n$10"; __END__ Aug 21 19:00:36 [1.1.1.3.200.125] 410381: Aug 21 23:00:35 UTC: %SEC-6-IPACCESSLOGP: list 101 denied tcp 10.161.24.153(3988) 10.158.24.10(135) 1 packet [download]	[reply] [d/l]
Re: Cisco Log Files: broken REGEX by eric256 (Parson) on Aug 22, 2003 at 00:10 UTC
use strict; my $data = "Aug 21 19:00:36 [1.1.1.3.200.125] 410381: Aug 21 23:00:35 +UTC: %SEC-6-IPACCESSLOGP: list 101 denied tcp 10.161.24.153(3988) -> +10.158.24.10(135), 1 packet"; my $timestamp = qr/[A-Z][a-z]+ \d\d \d\d:\d\d:\d\d/; my $address = qr/[\.\d]+/; my $id = qr/\d+/; my $timezone = qr/[A-Z]+/; #print $data; $data =~ /($timestamp) \[($address)\] ($id): ($timestamp) ($timezone): + (.?): (.?) (tcp\|icmp\|udp) ($address\(.?\)) -> ($address\(.?\)), +(.*)/; print "time: $1\n", "address: $2\n", "id: $3\n", "time2: $4\n", "time zone: $5\n", "error: $6\n", "msg: $7\n", "protocol: $8\n", "address1: $9\n", "address2: $10\n", "last: $11\n"; 1; [download] Ick. Double Ick. and fragile. ___________ Eric Hodges	[reply] [d/l]
Re: Cisco Log Files: broken REGEX (two solutions) by BrowserUk (Patriarch) on Aug 22, 2003 at 01:12 UTC
Not only does using /x make things a lot more readable, it also helps with debugging. By commenting out everything except the first element in the final regex, it allowed me to adjust that until it worked for all (both:) test lines. Then I uncommented the next element and adjusted that and so on until the whole thing matched. Using named sub elements allows you to re-use thise bits where necessary and would simplify adding in predefined elements like a better IP definition from regexp::Common or a datetime from somewhere. #! perl -slw use strict; my $re_datetime = qr[ [A-Z] [a-z]{2} \s \d{2} \s \d{2} : \d{2} : \d{2} + ]x; # Aug 21 19:00:36 my $re_MIB = qr/ \[ \d (?: \. \d+ )+ \] /x; # [1.1.1.3.200.125] my $re_msgid = qr[ \d{6} : ]x; # 41 +0381: my $re_TZ = qr[ [A-Z]{3} : ]x; # UT +C: my $re_type = qr[ %SEC-6- [A-Z]+ : ]x; # %SEC-6-IPACCESSLOGP: my $re_listid = qr[ list \s (\d+) ]x; # li +st 101 my $re_action = qr[ [a-z]+ ]x; # de +nied my $re_protocol = qr[ [a-z]+ ]x; # tc +p my $re_ip = qr[ \d+ (?: \. \d+ ){3} ]x; # 10 +.161.24.153 my $re_port = qr[ \( (\d+ (?: / \d+ )? ) \) ]x; # (3 +988) or (8/0) my $re_packets = qr[ , \s+ ( \d+ ) \s+ packet ]x; # , +1 packet my $re_log = qr[ ^ ( $re_datetime ) \s+ ( $re_MIB ) \s+ ( $re_msgid ) \s+ ( $re_datetime) \s+ ( $re_TZ ) \s+ $re_type \s+ $re_listid \s+ ( $re_action ) \s+ ( $re_protocol ) \s+ ( $re_ip ) \s* $re_port? \s+ -> \s+ ( $re_ip ) \s* $re_port? $re_packets \s* $ ]x; while( <DATA> ) { print join'\|', $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, +$13 if $_ =~ m[$re_log]; } =pod output P:\test>285616 Aug 21 19:00:36\|[1.1.1.3.200.125]\|410381:\|Aug 21 23:00:35\|UTC:\|101\|den +ied\|tcp\|10.161.24.153\|3988\|10.158.24.10\|135\|1 Use of uninitialized value in join or string at P:\test\285616.pl8 lin +e 37, <DATA> line 2. Aug 21 19:00:36\|[1.1.1.3.200.125]\|410382:\|Aug 21 23:00:35\|UTC:\|101\|den +ied\|icmp\|10.165.4.150\|\|211.95.79.233\|8/0\|1 =cut __DATA__ Aug 21 19:00:36 [1.1.1.3.200.125] 410381: Aug 21 23:00:35 UTC: %SEC-6- +IPACCESSLOGP: list 101 denied tcp 10.161.24.153(3988) -> 10.158.24.10 +(135), 1 packet Aug 21 19:00:36 [1.1.1.3.200.125] 410382: Aug 21 23:00:35 UTC: %SEC-6- +IPACCESSLOGDP: list 101 denied icmp 10.165.4.150 -> 211.95.79.233 (8/ +0), 1 packet [download] Note that the second line produces an "uninitialised value" warning for the second line. This is because that line has no port number after the first IP number. This will result in all the capture numbers thereafter being shifted, which is a pain. The best way I know of to avoid all the conditionals and stuff required to deal with regexes that contain conditional captures is to capture to named variables using `(?{ })` extended regex feature. #! perl -slw use strict; use re 'eval'; # Aug 21 19:00:36 my $re_datetime = qr[ [A-Z] [a-z]{2} \s \d{2} \s \d{2} : \d{2} : \d{2} + ]x; my $re_MIB = qr/ \[ \d (?: \. \d+ )+ \ # [1.1.1.3.200.125] my $re_msgid = qr[ \d{6} : ]x; # 41 +0381: my $re_TZ = qr[ [A-Z]{3} : ]x; # UT +C: my $re_type = qr[ %SEC-6- [A-Z]+ : ]x; #%SEC-6-IPACCESSLOGP: my $re_listid = qr[ list \s (\d+) ]x; # li +st 101 my $re_action = qr[ [a-z]+ ]x; # de +nied my $re_protocol = qr[ [a-z]+ ]x; # tc +p my $re_ip = qr[ \d+ (?: \. \d+ ){3} ]x; # 10 +.161.24.153 my $re_port = qr[ \( (\d+ (?: / \d+ )? ) \) ]x; # (3 +988) or (8/0) my $re_packets = qr[ , \s+ ( \d+ ) \s+ packet ]x; # , +1 packet my $re_log = qr[ ^ ( $re_datetime ) \s+ (?{ $first_date = $^N\|\|'' }) ( $re_MIB ) \s+ (?{ $MIB = $^N\|\|'' }) ( $re_msgid ) \s+ (?{ $msgID = $^N\|\|'' }) ( $re_datetime) \s+ (?{ $second_date = $^N\|\|'' }) ( $re_TZ ) \s+ (?{ $TZ = $^N\|\|'' }) $re_type \s+ $re_listid \s+ (?{ $listID = $^N\|\|'' }) ( $re_action ) \s+ (?{ $action = $^N\|\|'' }) ( $re_protocol ) \s+ (?{ $protocol = $^N\|\|'' }) ( $re_ip ) \s* (?{ $ip1 = $^N\|\|'' }) $re_port? \s+ (?{ $port = $^N\|\|'' }) -> \s+ ( $re_ip ) \s* (?{ $ip2 = $^N\|\|'' }) $re_port? (?{ $port2 = $^N\|\|'' }) $re_packets \s* (?{ $packets = $^N\|\|'' }) $ ]x; while( <DATA> ) { our( $first_date, $MIB, $msgID, $second_date, $TZ, $listID, $action, $protocol, $ip1, $port, $ip2, $port2, $packets ); print join'\|', $first_date, $MIB, $msgID, $second_date, $TZ, $list +ID, $action, $protocol, $ip1, $port, $ip2, $port2, $pac +kets if $_ =~ m[$re_log]; } =pod output P:\test>285616 Aug 21 19:00:36\|[1.1.1.3.200.125]\|410381:\|Aug 21 23:00:35\|UTC:\|101\|den +ied\|tcp\|10.161.24.153\|3988\|10.158.24.10\|135\|1 Aug 21 19:00:36\|[1.1.1.3.200.125]\|410382:\|Aug 21 23:00:35\|UTC:\|101\|den +ied\|icmp\|10.165.4.150\|10.165.4.150\|211.95.79.233\|8/0\|1 =cut __DATA__ Aug 21 19:00:36 [1.1.1.3.200.125] 410381: Aug 21 23:00:35 UTC: %SEC-6- +IPACCESSLOGP: list 101 denied tcp 10.161.24.153(3988) -> 10.158.24.10 +(135), 1 packet Aug 21 19:00:36 [1.1.1.3.200.125] 410382: Aug 21 23:00:35 UTC: %SEC-6- +IPACCESSLOGDP: list 101 denied icmp 10.165.4.150 -> 211.95.79.233 (8/ +0), 1 packet [download] Which I like because it avoids the capture variable shuffling and if you start using this approach consistantly, it becomes pretty much second nature to build regexes this way. The downsides are the "experimental" status of the "zero-width evaluation asserion" (Phew! What a handle:) and the need to `use re 'eval';` both of which are frowned upon in some circles. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller If I understand your problem, I can solve it! Of course, the same can be said for you.	[reply] [d/l] [select]
Summary: parsing CISCO ACL logs (was Re: Cisco Log Files: broken REGEX) by blue_cowdawg (Monsignor) on Aug 22, 2003 at 00:54 UTC
First off at the risk of souding like one of them talkng heads at an Academy Awards cermony I just want to thank everybody for their assistance with this thing. I was going nuts with it. Secondly: I always preach to folks that I teach Perl to that one of the first rules of dealing with data is make sure you understand the data before you try to parse it. I should have listened to my own sermons as I belatedly noticed that there were two different line formats depending on if it was a TCP denial or an ICMP denial. Secondly chunlou, enlil, chromatic and eric256 all suggested that I make my code more readable by using the qr construction. Advice that I heeded and this contributed greatly to solving this. Both because it was more readable and because I ended up not re-typing the same regexes and fat fingering them. First record type For the tcp deny the record looked like (just to review): `Aug 21 19:00:36 [1.1.1.3.200.125] 410381: Aug 21 23:00:35 UTC: %SEC-6- +IPACCESSLOGP: list 101 denied tcp 10.161.24.153(3988) -> 10.158.24.10 +(135), 1 packet` [download] and so to look for it I set up the following: `my $dtg=qr([A-Z][a-z]+\s+\d+\s+\d+:\d+:\d+); my $thingy=qr([\.\d]+); my $tz=qr([A-Z]{3}); my $ipaddr=qr(\d+\.\d+\.\d+\.\d+); my $timestamp = qr/[A-Z][a-z]+ \d\d \d\d:\d\d:\d\d/; my $address = qr/[\.\d]+/; my $id = qr/\d+/; my $timezone = qr/[A-Z]+/; my $fragger = qr/(\%SEC-6-IPACCESSLOGP\|\%SEC-6-IPACCESSLOGDP)/; my $tcp_deny=qr/^($dtg)\s\[$thingy\]\s\d+:\s($dtg)\s$tz:\s$fragger\:\s +list\s(\d+)\sdenied\s(tcp\|udp\|icmp)\s($ipaddr)\(\d+\)\s\-\>\s($ipaddr +)\(\d+\),\s(\d+)\spacket/;` [download] and I actually look for the packet thusly: `if ( $line =~ m@$tcp_deny@ ) { ... more stuff below` [download] Second line format The second record type looked like: `Aug 21 19:00:36 [1.1.1.3.200.125] 410382: Aug 21 23:00:35 UTC: %SEC-6- +IPACCESSLOGDP: list 101 denied icmp 10.165.4.150 -> 211.95.79.233 (8/ +0), 1 packet` [download] which used: `my $icmp_deny=qr/^($dtg)\s\[$thingy\]\s\d+:\s($dtg)\s$tz:\s$fragger\:\ +slist\s(\d+)\sdenied\s(tcp\|udp\|icmp)\s($ipaddr)\s\-\>\s($ipaddr)\s\(\ +d+\/\d+\),\s(\d+)\spacket/;` [download] Why bother? That my fellow monks is a tale to tell under Cool Uses for Perl once the script is all done and nice and tidy. It's a mess right now. Just a hint though: It has to do with all these virus attacks going on and how to find the infected machines... Peter @ Berghold . Net Sieze the cow! Bite the day! Nobody expects the Perl inquisition! Test the code? We don't need to test no stinkin' code! All code posted here is as is where is unless otherwise stated. Brewer of Belgian style Ales	[reply] [d/l] [select]
Re: Cisco Log Files: broken REGEX by RMGir (Prior) on Aug 21, 2003 at 23:27 UTC
All I can see in a quick scan is you don't seem to be accounting for the square brackets around the MIB-like thing after the timestamp. Am I nuts? -- Mike	[reply]
Re: Re: Cisco Log Files: broken REGEX by blue_cowdawg (Monsignor) on Aug 21, 2003 at 23:35 UTC
Good news: You were right and I missed the square braces. Bad news: It is still broke after I fixed it. Here's the new regex: `m@^([A-Z][a-z]+\s+\d+\s+\d+\:\d+\:\d+)\s+\[([\.\d]+)\]\s+(\d+)\:\s+([ +A-Z][a-z]+\s+\d+\s+\d+\:\d+\:\d+)\s+([A-Z]{3})\:\s+\%SEC\-6\-[A-Z]+\: +\s+list\s+\d+([a-z]+)\s+([a-z]+)\s+(\d+\.\d+\.\d+\.\d+)\s+\-\>\s+(\d+ +\.\d+\.\d+\.\d+)\s+\(\d+\/\d+\)\,\s+(\d)\s+packet$@` [download] Peter @ Berghold . Net Sieze the cow! Bite the day! Nobody expects the Perl inquisition! Test the code? We don't need to test no stinkin' code! All code posted here is as is where is unless otherwise stated. Brewer of Belgian style Ales	[reply] [d/l]
Re: Re: Re: Cisco Log Files: broken REGEX by RMGir (Prior) on Aug 21, 2003 at 23:42 UTC
Hehe, I figured that out myself once I whipped up a test bench. I still don't have it working, but I do have a few suggestions. Don't escape everything in sight, you'll go nuts. : and , don't need \, really. m@@x is your friend. Could you detect what you need to extract without matching the whole line? Note that ICMP and TCP have different "port" parts, so making a general regex is gonna bite. Anyhow, here's my test bench, with my latest non-working version of the regex: Read more... (2 kB) -- Mike	[reply] [d/l]
Re: Cisco Log Files: broken REGEX by Abigail-II (Bishop) on Aug 22, 2003 at 08:10 UTC
One technique I use in debugging long regexes like yours is the build the regex step-by-step, and test the regex after each step. So, in your example, start with the regex that matches the date, run that against your data and see whether it matches. If that's ok, extend the regex with the time, run it again against the data, then add the thing between brackets, etc, etc. Abigail	[reply]

Back to Seekers of Perl Wisdom

Log In^?

Domain Nodelet^?

www.com | www.net | www.org

Node Status^?

node history
Node Type: perlquestion [id://285616]
Approved by antirice
help

Chatterbox^?

How do I use this? • Last hour • Other CB clients

Other Users^?

Others contemplating the Monastery: (3)

As of 2024-04-25 22:06 GMT

Sections^?

Information^?

Find Nodes^?

Leftovers^?

Today I Learned

Voting Booth^?

No recent polls found