Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: Way to grep binary scalar without unpacking

by ikegami (Patriarch)
on Oct 04, 2007 at 18:37 UTC ( [id://642729]=note: print w/replies, xml ) Need Help??


in reply to Way to grep binary scalar without unpacking

$stuff =~ /\x7F/

or

index($stuff, "\x7F") >= 0

I think the latter is faster.

These are better than checking the unpacked string because they won't match 07 folled by F3.

Replies are listed 'Best First'.
Re^2: Way to grep binary scalar without unpacking
by diotalevi (Canon) on Oct 04, 2007 at 20:21 UTC

    "You believe?" You oughta know that the regexp engine uses the same code as index() for cases like this. Also, there's more ops to dispatch for the index() way. It isn't obvious at all which is faster.

    ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

      I never said it was obvious. I never said I was guessing. I believe it's faster because I did some benchmarks to test this, but that was some time ago. I could redo them, but so can the OP, and he has the benefit of having representative data.
        I did another test, including a wrapped memchr as
        I stated in another reply. Including array handling,
        I found a factor of 3 (to index/regex) even on my non-
        mem alignment (compared to Core2) critical system:
        1769416 in Rate by_regex by_index by_memchr by_regex 51.2/s -- -3% -68% by_index 52.9/s 3% -- -67% by_memchr 162/s 217% 207% --
        The code used here was:
        use strict; use warnings; use Benchmark qw( cmpthese ); my $fn = '/boot/vmlinux-2.6.18.8-96-default.gz'; open my $fh, '<', $fn or die $!; read $fh, my $buffer, 2_000_000 or die $!; print length $buffer, " in\n"; close $fh; my $subs = { by_index => sub { my ($p0, @offs)=(-1, ()); push @offs, $p0 while +($p0=index $buffer, "\xaa", $p0+1) != -1 +; push @offs, $p0 while +($p0=index $buffer, "\xbb", $p0+1) != -1 +; push @offs, $p0 while +($p0=index $buffer, "\xcc", $p0+1) != -1 +; return 0 + @offs }, by_regex => sub { my @offs=(); push @offs, pos($buffer) while $buffer =~ /\xaa/g; push @offs, pos($buffer) while $buffer =~ /\xbb/g; push @offs, pos($buffer) while $buffer =~ /\xcc/g; return 0 + @offs }, by_memchr => sub { my @offs=(); my_memchr( \@offs, $buffer, "\xaa" ); my_memchr( \@offs, $buffer, "\xbb" ); my_memchr( \@offs, $buffer, "\xcc" ); return 0 + @offs } }; cmpthese -3, $subs; use Inline C => qq[ /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */ void my_memchr(SV* rvav, SV* sv, SV *ch) { STRLEN srclen; char byte = *SvPV(ch, PL_na) ; char *svc = SvPV(sv, srclen); char *p = svc, *end = svc + srclen; AV *av = (AV*)SvRV(rvav); // if(SvTYPE(SvRV(rvav)) == SVt_PVAV) while((p=memchr(p, (int)byte, end-p)) !=0 && p<end) { av_push(av, newSViv(p-svc)); ++p; } } /* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - */ ];
        Regards
        mwa
        ikegamiI could redo them, but so can the OP, and he has the benefit of having representative data.

        I did a short test on a Linux 2.6.18 in a VM within a XP (Athlon/64 3400+)
        (I just searched some hex codes within the kernel image.)
        use strict; use warnings; use Benchmark qw( cmpthese ); my $fn = '/boot/vmlinux-2.6.18.8-96-default.gz'; open my $fh, '<', $fn or die $!; read $fh, my $buffer, 2_000_000 or die $!; print length $buffer, " in\n"; close $fh; my $subs = { by_index => sub { my ($p0, @offs)=(-1, ()); push @offs, $p0 while +($p0=index $buffer, "\xaa", $p0+1) != -1 +; push @offs, $p0 while +($p0=index $buffer, "\xbb", $p0+1) != -1 +; push @offs, $p0 while +($p0=index $buffer, "\xcc", $p0+1) != -1 +; return 0 + @offs }, by_regex => sub { my @offs=(); push @offs, pos($buffer) while $buffer =~ /\xaa/g; push @offs, pos($buffer) while $buffer =~ /\xbb/g; push @offs, pos($buffer) while $buffer =~ /\xcc/g; return 0 + @offs } }; cmpthese( -3, $subs );
        Which ended up somehow interesting (corrected, machine w/no load):
        1769416 in Rate by_regex by_index by_regex 51.0/s -- -5% by_index 53.4/s 5% --
        Very new to me. Thanks to all involved ;-)

        Regards
        mwa

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://642729]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (2)
As of 2024-04-26 03:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found