Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: Regex Help - Large regex example, and larger Parse::RecDescent attempt

by imp (Priest)
on Dec 22, 2006 at 06:39 UTC ( [id://591270]=note: print w/replies, xml ) Need Help??


in reply to Regex Help pulling Data from a string

The appropriate solution to this problem depends on how precise the pattern matching needs to be. How much post-extraction processing you are willing to do matters as well, e.g. do you need '58bn5904' or are you content with 'd:\data\58bn5904.dat'.

To give you an idea of how ugly the regex could become:

use strict; use warnings; # Example line: # e:\logfiles\beardstownbase.log [3] Thu 22Jun06 08:07:19 - (006415) S +ent file d:\data\58bn5904.dat successfully (25.0 Kb/sec - 859216 byte +s) # Desired: # beardstownbase,Thu 22Jun06 08:07:19,58bn5904,859216 bytes my $re_date = qr< (?:Sun|Mon|Tue|Wed|Thu|Fri|Sat) \s \d{1,2} # Day of month (?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) # Month \d{2} # Two digit year \s \d{2}:\d{2}:\d{2} >x; my $pattern = qr< e:\\logfiles\\(.*?) # Capture(1) filename \s \[\d+\] # Bracketed number \s ($re_date) # Capture(2) date \s - \s \(\d+\) # number in parens \s Sent \s file \s d:\\data\\(.*?)\.dat # Capture(3) file basename \s successfully \s \( [0-9.]+ \s [A-Z]b /sec [ ] - [ ] (\d+ \s bytes) # Capture(4) bytes text \) >x; while (my $line = <DATA>) { if ($line =~ /$pattern/) { my ($logfile, $date, $file_basename, $bytes) = ($1,$2,$3,$4); printf "(%s) (%s) (%s) (%s)\n", $logfile,$date,$file_basename, + $bytes; } } __DATA__ e:\logfiles\beardstownbase.log [3] Thu 22Jun06 08:07:19 - (006415) Sen +t file d:\data\58bn5904.dat successfully (25.0 Kb/sec - 859216 bytes)
I have been meaning to learn Parse::RecDescent for ages, so tonight I took some time to try and solve your problem with it. It is likely the wrong tool for this job, and definitely a poor implementation - I would welcome any feedback for people with stronger parse-fu.
use strict; use warnings; use Parse::RecDescent; $::RD_HINT=5; my $grammar = <<'GRAMMAR'; { use strict; use warnings; } logfile : 'e:\\logfiles\\' /[-A-Za-z0-9_.]+/ { $item[2] } date : m{ (?:Mon|Tue|Wed|Thu|Fri|Sat|Sun) \s \d\d (?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d\d }x time : /\d{2}:\d{2}:\d{2}/ sentfile: <skip:''> 'd:\\data\\' /[-A-Za-z0-9_]+/ '.dat' { $item[3] } rate : /\d+\.\d [A-Za-z]+\/sec/ bytecount : /\d+ bytes/ parse : logfile /\[\d+\]/ date time /- \(\d+\) Sent file / sentfile <skip:'[- \t()]*'> ( /successfully/ rate ) bytecount { [ @item{qw(logfile date time sentfile bytecount)}] } GRAMMAR # Expect: beardstownbase,Thu 22Jun06 08:07:19,58bn5904,859216 bytes my $parser = Parse::RecDescent->new($grammar); use Data::Dumper; while (my $line = <DATA>) { last unless $line =~ /\S/; my @fields = $parser->parse($line); if (@fields) { print Dumper \@fields; } } __DATA__ e:\logfiles\beardstownbase.log [3] Thu 22Jun06 08:07:19 - (006415) Sen +t file d:\data\58bn5904.dat successfully (25.0 Kb/se
Output:
$VAR1 = [ [ 'beardstownbase.log', 'Thu 22Jun06', '08:07:19', '58bn5904', '859216 bytes' ] ];

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://591270]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (4)
As of 2024-03-28 23:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found