Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Way to grep binary scalar without unpacking

by Eradicatore (Monk)
on Oct 04, 2007 at 18:28 UTC ( [id://642724]=perlquestion: print w/replies, xml ) Need Help??

Eradicatore has asked for the wisdom of the Perl Monks concerning the following question:

Does someone know if you can use regex to grep through a scalar value you just read out from a file in binmode?

I want to read out say 1000 bytes of a file, and then scan that 100 bytes for a string. But I do NOT want to first unpack it because I'm hoping that avoiding the unpack may save some time.

Or am I fooling myself into thinking that dealing with the scalar in binary form will be any faster than unpacking and then using normal regex?

If I do unpack, what's the best way to do that?

open BIN_IN, "<test.elf"; binmode BIN_IN; read(BIN_IN, $stuff, 1000,0); if ($stuff =~ /7F/) {print "got it!\n";} @tmp = unpack("(H2)*", $stuff); foreach $h (@tmp) { print "$h "; } close BIN_IN;

Justin Eltoft

"If at all god's gaze upon us falls, its with a mischievous grin, look at him" -- Dave Matthews

Replies are listed 'Best First'.
Re: Way to grep binary scalar without unpacking
by ikegami (Patriarch) on Oct 04, 2007 at 18:37 UTC
    $stuff =~ /\x7F/

    or

    index($stuff, "\x7F") >= 0

    I think the latter is faster.

    These are better than checking the unpacked string because they won't match 07 folled by F3.

      "You believe?" You oughta know that the regexp engine uses the same code as index() for cases like this. Also, there's more ops to dispatch for the index() way. It isn't obvious at all which is faster.

      ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

        I never said it was obvious. I never said I was guessing. I believe it's faster because I did some benchmarks to test this, but that was some time ago. I could redo them, but so can the OP, and he has the benefit of having representative data.
Re: Way to grep binary scalar without unpacking
by mwah (Hermit) on Oct 04, 2007 at 19:32 UTC
    EradicatoreI want to read out say 1000 bytes of a file,
    and then scan that 100 bytes for a string.
    But I do NOT want to first unpack it because
    I'm hoping that avoiding the unpack may save some time.


    OK, ikegami answered to that already. I'd like to add that
    index() would be the fastest pure-Perl solution. If its a
    very big binary chunk, write a small "Inline => C {}" wrapper
    to C's (stdlib) memchr() function. This might be, depending on the
    architecture, up to three times faster than index() (on large chunks).

    If I do unpack, what's the best way to do that?

    Your solution would be o.k., you might consider to do
    a pseudo-Schwartzian to map the indices into your target
    array @tmp, sth. like:
    open my $fh, '<', 'test.elf' or die "can't do anything: $!"; binmode $fh; read $fh, my $stuff, 1000 or die "read error: $!"; close $fh; print length $stuff, " bytes in\n"; my $offs = 0; my @tmp = map $_->[1], grep $_->[0] eq '7f', map [$_, $offs++], unpack "(H2)*", $stuff; # prints "7f" offsets in binary file print join':', @tmp;
    Regards

    mwa

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://642724]
Approved by naikonta
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (3)
As of 2024-03-29 05:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found