Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re^2: Way to grep binary scalar without unpacking

by diotalevi (Canon)
on Oct 04, 2007 at 20:21 UTC ( #642759=note: print w/replies, xml ) Need Help??


in reply to Re: Way to grep binary scalar without unpacking
in thread Way to grep binary scalar without unpacking

"You believe?" You oughta know that the regexp engine uses the same code as index() for cases like this. Also, there's more ops to dispatch for the index() way. It isn't obvious at all which is faster.

⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

  • Comment on Re^2: Way to grep binary scalar without unpacking

Replies are listed 'Best First'.
Re^3: Way to grep binary scalar without unpacking
by ikegami (Pope) on Oct 04, 2007 at 20:40 UTC
    I never said it was obvious. I never said I was guessing. I believe it's faster because I did some benchmarks to test this, but that was some time ago. I could redo them, but so can the OP, and he has the benefit of having representative data.
      I did another test, including a wrapped memchr as
      I stated in another reply. Including array handling,
      I found a factor of 3 (to index/regex) even on my non-
      mem alignment (compared to Core2) critical system:
      1769416 in Rate by_regex by_index by_memchr by_regex 51.2/s -- -3% -68% by_index 52.9/s 3% -- -67% by_memchr 162/s 217% 207% --
      The code used here was:
      use strict; use warnings; use Benchmark qw( cmpthese ); my $fn = '/boot/vmlinux-2.6.18.8-96-default.gz'; open my $fh, '<', $fn or die $!; read $fh, my $buffer, 2_000_000 or die $!; print length $buffer, " in\n"; close $fh; my $subs = { by_index => sub { my ($p0, @offs)=(-1, ()); push @offs, $p0 while +($p0=index $buffer, "\xaa", $p0+1) != -1 +; push @offs, $p0 while +($p0=index $buffer, "\xbb", $p0+1) != -1 +; push @offs, $p0 while +($p0=index $buffer, "\xcc", $p0+1) != -1 +; return 0 + @offs }, by_regex => sub { my @offs=(); push @offs, pos($buffer) while $buffer =~ /\xaa/g; push @offs, pos($buffer) while $buffer =~ /\xbb/g; push @offs, pos($buffer) while $buffer =~ /\xcc/g; return 0 + @offs }, by_memchr => sub { my @offs=(); my_memchr( \@offs, $buffer, "\xaa" ); my_memchr( \@offs, $buffer, "\xbb" ); my_memchr( \@offs, $buffer, "\xcc" ); return 0 + @offs } }; cmpthese -3, $subs; use Inline C => qq[ /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */ void my_memchr(SV* rvav, SV* sv, SV *ch) { STRLEN srclen; char byte = *SvPV(ch, PL_na) ; char *svc = SvPV(sv, srclen); char *p = svc, *end = svc + srclen; AV *av = (AV*)SvRV(rvav); // if(SvTYPE(SvRV(rvav)) == SVt_PVAV) while((p=memchr(p, (int)byte, end-p)) !=0 && p<end) { av_push(av, newSViv(p-svc)); ++p; } } /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */ ];
      Regards
      mwa
      ikegamiI could redo them, but so can the OP, and he has the benefit of having representative data.

      I did a short test on a Linux 2.6.18 in a VM within a XP (Athlon/64 3400+)
      (I just searched some hex codes within the kernel image.)
      use strict; use warnings; use Benchmark qw( cmpthese ); my $fn = '/boot/vmlinux-2.6.18.8-96-default.gz'; open my $fh, '<', $fn or die $!; read $fh, my $buffer, 2_000_000 or die $!; print length $buffer, " in\n"; close $fh; my $subs = { by_index => sub { my ($p0, @offs)=(-1, ()); push @offs, $p0 while +($p0=index $buffer, "\xaa", $p0+1) != -1 +; push @offs, $p0 while +($p0=index $buffer, "\xbb", $p0+1) != -1 +; push @offs, $p0 while +($p0=index $buffer, "\xcc", $p0+1) != -1 +; return 0 + @offs }, by_regex => sub { my @offs=(); push @offs, pos($buffer) while $buffer =~ /\xaa/g; push @offs, pos($buffer) while $buffer =~ /\xbb/g; push @offs, pos($buffer) while $buffer =~ /\xcc/g; return 0 + @offs } }; cmpthese( -3, $subs );
      Which ended up somehow interesting (corrected, machine w/no load):
      1769416 in Rate by_regex by_index by_regex 51.0/s -- -5% by_index 53.4/s 5% --
      Very new to me. Thanks to all involved ;-)

      Regards
      mwa

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://642759]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (4)
As of 2020-10-27 18:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My favourite web site is:












    Results (257 votes). Check out past polls.

    Notices?