Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Re: Pattern matching in binary mode (I/O)

by tye (Sage)
on Mar 19, 2004 at 17:15 UTC ( [id://338053]=note: print w/replies, xml ) Need Help??


in reply to Pattern matching in binary mode

Related to another thread, I need to do simple pattern matching and replacing on a file in binary mode (because its a binary file!).

The main difference will not be in the replacing, but in the reading and writing. Most substituting can be done be reading one line, substituting, writing, repeat. This allows very large text files to be processed quickly (no allocating huge buffers to hold the entire file contents or the file just being too big to even fit in memory).

For a binary file, you could get a similar process quite easily with $/ = \4096;, which would cause <IN> to read a 4096-byte chunk each time. Unfortunately, '77777' could end up with the first two characters at the end of one buffer and the last three characters at the beginning of the next buffer (for example), so s/77777/.../ would fail to substitute that case.

If your binary files are small enough to fit into memory (preferably fit into physical memory but fitting into virtual memory may still be 'fast enough'), then you can just slurp the whole file into a single scalar quite easily (using a 'slurp' module or setting $/ to undef, etc.).

If your binary files are too big, then things get trickier. Probably the most general solution is to use a sliding window. Pick a string length that you are pretty sure is longer than any substring that you'll run into that matches your pattern:

sub binSubst { my( $infile, $outfile, $regex, $repl, $maxlen, $bufsiz )= @_; binmode($infile); binmode($outfile); $bufsize ||= 16*1024; my $buf= ''; # Read the next chunk, appending to any left-over bytes: while( sysread( $infile, $buf, $bufsize, length($buf) ) ) { $buf =~ s/$regex/$repl/g; # How much to write out, unless... my $end= length($buf)-$maxlen; # ... we matched after that point and so # should write upto the end of last match: $end= $+[0] if $end < $+[0]; # Write out what we can, removing it from the buffer: print $outfile substr($buf,0,$end,''); } # Write out any left overs: print $outfile $buf; }

- tye        

Replies are listed 'Best First'.
Re^2: Pattern matching in binary mode (I/O)
by kschwab (Vicar) on Oct 20, 2021 at 21:34 UTC

    Appreciated this comment, as it's one of the few useful things Google returned when searching for "perl sliding window string replace". I used it to make something similar that uses substr() instead of a regex, and bumps the buffer up if the search string is larger than the window size.

    # note: only lightly tested, ymmv sub sliding_replace { my($srcfile,$dstfile,$search,$replace)=@_; if (! -e $file) { die("File [$file] does not exist\n"); } open(my $src,'<:raw',$srcfile); open(my $dst,'>:raw',$dstfile); my $winsize=4096; my $buf= ''; while(1) { my $bytecount=$src->sysread($buf, $winsize*2, length($buf)); while (1) { my $index=index($buf,$search); if ($index > 0) { substr($buf,$index,length($search),$replace); my $len=$index+length($replace); $dst->print(substr($buf,0,$len,'')); } else { $dst->print(substr($buf,0,$winsize,'')); last; } } last if $bytecount == 0; } # print any leftovers $dst->print($buf); $src->close(); $dst->close(); }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://338053]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (4)
As of 2024-04-25 18:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found