Eating RAM problem

TheFifthDeuce has asked for the wisdom of the Perl Monks concerning the following question:

Hello folks. I got a problem that I can't solve. I am working on a pretty cool encryption/decryption system. OK, I am reading a text file which consists of just 0's and 1's...no newlines or whitespace. The file can range in size and get to be VERY large. For this example, the file I am reading is 2,387,250 bytes in size. I need to get every byte of the file, so here are 3 different methods I tried using, and each one eats up ALOT of RAM:

sub test_loop_1{
    ########   RAM used: 190 MB
    my(@all, $elements);

    @all = ();

    open(FILE, $file) or die;
    while(<FILE>){
       push @all, /\d/og;
    }
    close(FILE);

    $elements = @all;
    print $elements;  # Just for confirmation - prints 2387250
}

sub test_loop_2{
    ########   RAM used: 185 MB
    my(@all, @all2, $all, $elements);
    
    open(FILE, $file) or die;
    @all = <FILE>;
    close(FILE);
    
    $all = join('', @all);
    @all2 = split('', $all);
    $elements = 0;
     
    foreach(@all2){
       $elements ++;
    }

    print $elements;  # Just for confirmation - prints 2387250
}

sub test_loop_3{
    ########    RAM used: 120 MB
    my(@all, @all2, $all, $elements);

    open(FILE, $file) or die;
    @all = <FILE>;
    close(FILE);

    $all = join('', @all);
    @all2 = ();

    for($i = 1; $i <= length($all); $i ++){ 
      push @all2, substr($all, $i, 1);
    }
   
    $elements = @all2;
    print $elements;  # Just for confirmation - prints 2387250
}
[download]

Is there any way around this hogging of RAM, or being that the file is just so large in size, am I gonna have to deal with it?

Thanks for any advice,
David
http://www.trixmaster.com

Comment on Eating RAM problem Download Code

Replies are listed 'Best First'.
Re: Eating RAM problem by particle (Vicar) on Aug 01, 2002 at 17:02 UTC
if your file contains only binary data, why is it a text file? you can vastly compress it by using one bit per bit instead of one byte per bit. use vec and binmode. for a more friendly interface, you can tie your bit vector to an array with Tie::VecArray ~Particle accelerates	[reply] [d/l]
Re: Re: Eating RAM problem by TheFifthDeuce (Novice) on Aug 01, 2002 at 17:44 UTC
Thanks, but that is not an option. It cannot be compressed any further. I am working on an encryption system where each ASCII char is assigned a certain number of bits, so for example, if the text to be encrypted is 1000 bytes, then after encryption that text will be converted to 36000 bytes consisting of just 0's and 1's.	[reply]
Re: Eating RAM problem by crenz (Priest) on Aug 01, 2002 at 20:01 UTC
Bytes always consist of 0's and 1's. ;-) Frankly speaking, I am not sure I understand you here. Let me rephrase: For each block of n bytes, you are going to replace it by a number of m bytes; m>n. Your input data and output data are files consisting of 0's and 1's. I don't understand why, but I accept that. Is that correct? If yes, you can perform any mathematical operations with any of the following three representations of the data: A `@list` of bits, e.g. `@list = (0, 1, 0, 0, 0, 0, 0, 1); # 'A'`; this is what you are using. A `$bitstring`, e.g. `$bitstring = '01000001'; # 'A'`. Binary data, e.g. `$data = 'A';` (that is, read directly from the file using e.g. `$data = <file>`). Obviously, this representation uses the least amount of space. This is not really a compression (for my definition of compression), it is just the 'natural' representation of the data. On the contrary, the other two representations are (probably unnecessary) expansions. These three representations are equivalent; you just need to use different syntax to access them. For example, to access the third bit in the data, you would use `$third_bit = @list[2];` `$third_bit = substr($bitstring, 2, 1);` `$third_bit = vec($data, 5, 1);` (this one is a bit more tricky, see the documentation for vec) To access whole bytes or blocks of bytes, you would use `splice`, `substr` and `substr`, respectively. All the operations you will need to perform can be expressed in all three data representations -- but the last one will only use 2M of memory... Plus, for the last one, you can use perl's binary or, and etc, whereas for the `@list` and `$bitstring`, you'll have to emulate the mathematical functions (using the abovementioned `substr`, `vec` etc.)	[reply] [d/l] [select]
Re: Re: Re: Eating RAM problem by graff (Chancellor) on Aug 02, 2002 at 01:43 UTC
You said: I am working on an encryption system where each ASCII char is assigned a certain number of bits, so for example, if the text to be encrypted is 1000 bytes, then after encryption that text will be converted to 36000 bytes consisting of just 0's and 1's. Is it the case that the encryption system requires access to the entire data stream in order to work at all? If encrypting, say, 10 sets of 100 bytes (producing 10 sets of 3600 bytes) works as well as cranking a lump of 1000 bytes into 36000, then you should just read, process and output a small portion of data at a time, rather than trying to hold an entire file -- with massive amounts of wasted bits -- in memory at one time. Apart from that -- I'm sorry but... -- if memory consumption is an issue, and forcing some particular method of bit padding is a requirement, I'd use C rather than Perl. update: Maybe what you want is sysread, to bring a stated number of bytes into an input scalar variable; e.g.: `while ( $n_bytes_read = sysread( FILE, $inpbuf, 32 ) > 0 ) { if ( $n < 32 ) { # must be the last chunk # ... maybe this needs special treatment } process_input_bytes( $inpbuf ); }` [download]	[reply] [d/l]
Re: Eating RAM problem by Abigail-II (Bishop) on Aug 01, 2002 at 17:54 UTC
Your problem isn't so much the file size, your problem is that you want to make an array element for every single character. This is Perl, not C, so this is going to be costly - you'll get the overhead of a "Perl value" for each character. Do you really need that? Can't you use substr? Do you have to have all the characters of the file at the same time? Isn't the encryption/decryption algorithm made such that it encrypts/decrypts blocks of some decent size? Abigail	[reply]
Re: Re: Eating RAM problem by TheFifthDeuce (Novice) on Aug 01, 2002 at 18:30 UTC
Well yes, the algorhythm does work on blocks. I was using an anology for every byte of a 2 MB file, because that would still be the realistic equivilent of 32 byte size blocks of a 64 MB file. I guess I could put a max-restriction on the data that can be entered to encrypt.lol The point is is that I HAVE to have each chunk of 32 chars from the file to work with...whether I am using an array or not. How how can I do something like this using substr as you suggest, or for that matter, ANY way without draining RAM!lol `sub get_data{ my(@chunks_of_32); @chunks_of_32 = (); open(FILE, $file) or die; while(<FILE>){ push @chunks_of_32, /\d{32}/og; } close(FILE); }` [download] Thanks	[reply] [d/l]
Re: Eating RAM problem by Abigail-II (Bishop) on Aug 02, 2002 at 09:19 UTC
Eh, why don't you just read in 32 characters, process them, write the output and then read in the next 32 characters? If you don't need the entire file at once, don't read it all at once. Abigail	[reply]
Re: Eating RAM problem by chromatic (Archbishop) on Aug 01, 2002 at 17:43 UTC
`sub by_string { my $file = shift; local *IN, $/; open( IN, $file ) or die "Cannot open '$file': $!"; return <IN>; }` [download] Access each element with substr. Memory savings? Several bytes per character, because Perl doesn't have to create a new SV for each character.	[reply] [d/l]
Re: Re: Eating RAM problem by TheFifthDeuce (Novice) on Aug 01, 2002 at 18:08 UTC
Chromatic, I need to get each element into an array. If the file size is 2 million bytes, then the array should have 2 million elements. How can I do that without draining RAM? Using your sub I get: `@blah = by_string($file); $i = 0; foreach(@blah){ $i ++; } print $i; # Prints 1... not what I am looking for` [download] Thanks	[reply] [d/l]
Re: Re: Re: Eating RAM problem by rsteinke (Scribe) on Aug 01, 2002 at 18:50 UTC
Then use `length $blah[0]` instead. You can do anything with strings that you can do with an array of 0's and 1's. The syntax is just a little different. Ron Steinke rsteinke@w-link.net	[reply] [d/l]
Re: Re: Re: Eating RAM problem by I0 (Priest) on Aug 02, 2002 at 07:03 UTC
Why do you need to get each element into an array? Could you `use Tie::VecArray;` ?	[reply] [d/l]
Re: Eating RAM problem by Cine (Friar) on Aug 01, 2002 at 18:13 UTC
`sub test_loop_4 { ####### Uses a lot less RAM, but still a lot, because there are 2mil + elems in @all2... ####### A wild guess would be about 20-25*filesize in ramusage open(FILE, $file) or die $!; my $buf = ''; my @all2 = (); while(read FILE,$buf,1) { push @all2,$buf; } }` [download] T I M T O W T D I	[reply] [d/l]
Re: Re: Eating RAM problem by TheFifthDeuce (Novice) on Aug 01, 2002 at 18:47 UTC
Thanks Cine, but your example still uses 120 MB of RAM. With everybodies input, I now realize WHY RAM is being eaten alive.lol I gotta work on a buffer-scheme or multiple reads/writes from the file. Anybody comes up with anything, please post! Thanks, David	[reply]
Re: Re: Re: Eating RAM problem by Cine (Friar) on Aug 01, 2002 at 19:23 UTC
It is quite difficult to come up with a caching scheme for a usage pattern that is unknown ;) I suggest you make a new question where you state what you need. T I M T O W T D I	[reply]

Back to Seekers of Perl Wisdom