I think
TedPride has the right idea. Setting $INPUT_RECORD_SEPERATOR to undefined (undef $/) has the effect of causing the while(<IN> statement to slurp the entire three+ gig file at once. From the docs
"Entirely undefining $/ makes the next line input operation slurp in the remainder of the file as one scalar value"
Set $/ to some reasonable size, maybe to satisfy memory limitations. Then you can match blocks of the file in much the way
TedPride describes. Something like this
#undef $/;
$/ = \2048; # 2K blocks
my $blocksz = 2048;
open IN, "tmp";
open OUT, ">>$ARGV[1]"; #more efficient when parsing more than one b
+lock
#Extract the necessary data bits
my $block01 = <IN>;
my $block02 = '';
while ($block02 = <IN>) {
my $block = $block01.$block02;
$block =~ s/11110100.{8}(.{1520})11110100.{8}(.{464}).{1056}/$1$2
+/g;
# $_ =~ s/11110100.{8}(.{1520})11110100.{8}(.{464}).{1056}/$1$2/g;
# $final - pack("B*", $_); #Conver data back to original binary fo
+rmat
# $final = pack("B*", $block); # this is wrong
$final = pack("B*", substr($block,0,$blocksz)); # this should wor
+k
print OUT "$final";
$final = ''; # this is strictly unnecssary but does keep variable
+ clean
# undef $final;
$block01 = substr($block,-$blocksz); # this moves the upper bloc
+k down
}
$final = pack("B*", substr($block,-$blocksz)); # get final block
print OUT "$final";
close OUT;
close IN;
Update:
Corrected a couple lines in code
PJ
use strict; use warnings; use diagnostics;