Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^4: fast greedy regex

by js1 (Monk)
on Jun 08, 2004 at 20:49 UTC ( [id://362548]=note: print w/replies, xml ) Need Help??


in reply to Re^3: fast greedy regex
in thread fast greedy regex

Many thanks for all the interest and help here. All the replies were useful.

I really liked these constructs:

s/\".*\"// my @F = split(/\s+/);

and

my @F = /(\" .*? \" | \S+)/gx

But I found the quickest solution was this:

while (<>){ $front = substr( $_, 0, index($_, '"' )-1, ""); $back = substr( $_, rindex( $_, '"' )+2); $user_agent = substr ($_, 1, rindex( $_, '"' )); $front=~/^([^#\s]+\s\S+)\s(\S+)\s(\S+)\s(\S+)\s(\S+)\s(\S+)\s(\S+) +\s(\S+)\s(\S+)\s(\S+)\s(\S+)\s(\S+)\s(\S+)\s(\S+)\s(\S+)$/; print <<END_FORMAT; date time = $1 time taken = $2 c-ip = $3 sc-status = $4 ns-action = $5 sc-bytes = $6 cs-bytes = $7 cs-method = $8 cs-uri-scheme = $9 cs-host = $10 cs-uri-stem = $11 cs-username = $12 s-hierarchy = $13 s-supplier-name = $14 cs(Content-Type) = $15 cs(User-Agent) = $user_agent END_FORMAT $back=~/^(\S+)\s(\S+)\s(\S+)\s(\S+)\s(\S+)\s*$/; print <<END_FORMAT2 sc-filter-result = $1 sc-filter-category = $2 x-virus-id = $3 s-ip = $4 s-sitename = $5 END_FORMAT2 }

This processed the following gzip'd log:

bash-2.05b$ ls -l SG* -rwxr-xr-x 1 js js 106236830 Mar 7 17:02 SG_CSGL02_mai +n_470302220000.log.gz

in 1 minute 32 sec

bash-2.05b$ time gzip -dc SG* | ./test.pl >/dev/null 6.96user 0.63system 1:32.95elapsed 8%CPU (0avgtext+0avgdata 0maxreside +nt)k 0inputs+0outputs (94major+33minor)pagefaults 0swaps

on a 2.6Ghz AMD processor (500MB).

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://362548]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (5)
As of 2024-04-25 16:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found