http://qs321.pair.com?node_id=684068


in reply to Matching lines in 2+ GB logfiles.

Has anyone here who is claiming that perl can't outrun grep actual run the script that I posted here that dws wrote? This dws guy is on to something. I was finally able to modify it to work like grep and it's 14 time faster than grep. I am working with a 484 MB mailog.

This could be more elegant but this my rookie solution..

while ( $window =~ m/([a-zA-Z]{3}\s{1,2}\d{1,2}.*\n)/oigc ) { $line = $1; if ( $1 =~ /$re/ ) { &$callback($line); } }
ls -ltrh /var/log/syslog-ng/server2/ | grep maillog.2 -rw-r----- 1 root logs 484M Mar 11 11:13 maillog.2 -rw-r----- 1 root logs 230M Apr 1 04:10 maillog.2.gz [dmathis@aus02syslog ~]$ date; ./jujuspeed; date Thu May 1 19:27:57 CDT 2008 Feb 28 09:53:49 exmx2 sendmail[XXXXX]: 8791: to=<hidden@hotmail.com>, +delay=00:00:01, xdelay=00:00:01, mailer=esmtp, pri=X3604, relay=mx1.h +otmail.com. [X5.5X.2X5.X], dsn=2.0.0, stat=Sent ( <X4X0399.120421402X +XXX.JavaMail.root@hidden.com> Queued mail for delivery) Thu May 1 19:28:10 CDT 2008 Time taken: 13 Seconds [dmathis@aus02syslog ~]$ date; egrep -i 'hidden@hotmail.com' /var/log/ +syslog-ng/server2/maillog.2; date Thu May 1 19:28:48 CDT 2008 Feb 28 09:53:49 exmx2 sendmail[XXXXX]: 8791: to=<hidden@hotmail.com>, +delay=00:00:01, xdelay=00:00:01, mailer=esmtp, pri=X3604, relay=mx1.h +otmail.com. [X5.5X.2X5.X], dsn=2.0.0, stat=Sent ( <X4X0399.120421402X +XXX.JavaMail.root@hidden.com> Queued mail for delivery) Thu May 1 19:31:57 CDT 2008 Time Taken: 189 Seconds

Thanks for all of the help on here. I have learned alot :)

Replies are listed 'Best First'.
Re^2: Matching lines in 2+ GB logfiles.
by alexm (Chaplain) on May 02, 2008 at 11:16 UTC
    while ( $window =~ m/([a-zA-Z]{3}\s{1,2}\d{1,2}.*\n)/oigc ) { $line = $1; if ( $1 =~ /$re/ ) { &$callback($line); } }
    This is very close to what mscharrer suggested before.
      Indeed!

      After all this is over, all that will really have mattered is how we treated each other.