Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^5: Out of memory problems

by periapt (Hermit)
on Oct 22, 2004 at 13:59 UTC ( [id://401475]=note: print w/replies, xml ) Need Help??


in reply to Re^4: Out of memory problems
in thread Out of memory problems

Hmmm.

In looking more closely at your regex, it seems like you are replacing a sequence of 3072 characters with a sequence of 1984 characters. Thus if there is one replacement in $block, the statement $final = pack("B*", substr($block,0,$blocksz)); will include 1088 unchecked characters from $block02. That would explain why it appears to be adding whats left at the end of the boundary. You may have to keep track of the number of substitutions performed and then calculate how many characters you need to include in the pack statement. Maybe something like ...
my $nrrepl = $block =~ s/11110100.{8}(.{1520})11110100.{8}(.{464} +).{1056}/$1$2/g; my $outblocksz = $blocksz - ($nrrepl * 1088); $final = pack("B*", substr($block,0,$outblocksz)); # this should +work
You might then have to be sure that $outblocksize is a multiple of 8. It probably will be given the patterns you are working on.

There are a couple of implicit assumptions in the code that we might examine. Is the data you are working with byte aligned and of even size? That is, is the data comprised of 32 bit integers? or does the data vary say, a 4 byte integer, followed by a 7 byte string etc? Since you are packing with 'B*' you could be introducing additional bits at the literal byte (8bit) boundary. If the data is evenly spaced, you could set BLOCKSZ to the size of your regex, that might keep everthing aligned properly.

Another possibility is that when you change a sequence across the boundary between blocks 01 and 02, you introduce a sequence in block02. Your sequence is rather long and involved though so I rather assumed that wouldn't happen but I guess you should consider this as a fringe case.


PJ
use strict; use warnings; use diagnostics;

Replies are listed 'Best First'.
Re^6: Out of memory problems
by tperdue (Sexton) on Oct 23, 2004 at 14:38 UTC
    I did try this modification to the code with little luck. I'm getting the correct abount of data out but only the firest 4 chunks are correct. I did notice, after printing out the $outblocksize that a few were not multiples of 8 which shouldn't happen. There is no corruption in the data as I've ran an extremely slower piece of code on a smaller sample with the correct data being produced. I'll post that piece of code Monday since it's on my machine at work. Any ideas until then??
Re^6: Out of memory problems
by tperdue (Sexton) on Oct 26, 2004 at 12:28 UTC
    open IN, "$ARGV[0]"; binmode IN; @file = <IN>; close IN; foreach $tmp1 (@file) { $array = unpack("B*", $tmp1); $final .= $array; undef $array; } undef @file undef $tmp1; $final =~ s/11110100.{8}(.{1520})11110100.{8}(.{464}).{1056}/$1$2/g; #EXTRACT USABLE DATA print "Finished extracting.\n\n"; $finalbinary = pack("B*", $final); #CONVERT BACK TO BINARY print OUT "$finalbinary"; close OUT;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://401475]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (10)
As of 2024-04-16 08:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found