TheFifthDeuce has asked for the wisdom of the Perl Monks concerning the following question:
Hello folks. I got a problem that I can't solve. I am working on a pretty cool encryption/decryption system. OK, I am reading a text file which consists of just 0's and 1's...no newlines or whitespace. The file can range in size and get to be VERY large. For this example, the file I am reading is 2,387,250 bytes in size. I need to get every byte of the file, so here are 3 different methods I tried using, and each one eats up ALOT of RAM:
sub test_loop_1{
######## RAM used: 190 MB
my(@all, $elements);
@all = ();
open(FILE, $file) or die;
while(<FILE>){
push @all, /\d/og;
}
close(FILE);
$elements = @all;
print $elements; # Just for confirmation - prints 2387250
}
sub test_loop_2{
######## RAM used: 185 MB
my(@all, @all2, $all, $elements);
open(FILE, $file) or die;
@all = <FILE>;
close(FILE);
$all = join('', @all);
@all2 = split('', $all);
$elements = 0;
foreach(@all2){
$elements ++;
}
print $elements; # Just for confirmation - prints 2387250
}
sub test_loop_3{
######## RAM used: 120 MB
my(@all, @all2, $all, $elements);
open(FILE, $file) or die;
@all = <FILE>;
close(FILE);
$all = join('', @all);
@all2 = ();
for($i = 1; $i <= length($all); $i ++){
push @all2, substr($all, $i, 1);
}
$elements = @all2;
print $elements; # Just for confirmation - prints 2387250
}
Is there any way around this hogging of RAM, or being that the file is just so large in size, am I gonna have to deal with it?
Thanks for any advice,
David
http://www.trixmaster.com
Re: Eating RAM problem
by particle (Vicar) on Aug 01, 2002 at 17:02 UTC
|
if your file contains only binary data, why is it a text file? you can vastly compress it by using one bit per bit instead of one byte per bit. use vec and binmode. for a more friendly interface, you can tie your bit vector to an array with Tie::VecArray
~Particle *accelerates*
| [reply] [d/l] |
|
Thanks, but that is not an option. It cannot be compressed any further. I am working on an encryption system where each ASCII char is assigned a certain number of bits, so for example, if the text to be encrypted is 1000 bytes, then after encryption that text will be converted to 36000 bytes consisting of just 0's and 1's.
| [reply] |
|
Bytes always consist of 0's and 1's. ;-)
Frankly speaking, I am not sure I understand you here. Let me rephrase: For each block of n bytes, you are going to replace it by a number of m bytes; m>n. Your input data and output data are files consisting of 0's and 1's. I don't understand why, but I accept that. Is that correct?
If yes, you can perform any mathematical operations with any of the following three representations of the data:
- A @list of bits, e.g. @list = (0, 1, 0, 0, 0, 0, 0, 1); # 'A'; this is what you are using.
- A $bitstring, e.g. $bitstring = '01000001'; # 'A'.
- Binary data, e.g. $data = 'A'; (that is, read directly from the file using e.g. $data = <file>). Obviously, this representation uses the least amount of space. This is not really a compression (for my definition of compression), it is just the 'natural' representation of the data. On the contrary, the other two representations are (probably unnecessary) expansions.
These three representations are equivalent; you just need to use different syntax to access them. For example, to access the third bit in the data, you would use
- $third_bit = @list[2];
- $third_bit = substr($bitstring, 2, 1);
- $third_bit = vec($data, 5, 1); (this one is a bit more tricky, see the documentation for vec)
To access whole bytes or blocks of bytes, you would use splice, substr and substr, respectively. All the operations you will need to perform can be expressed in all three data representations -- but the last one will only use 2M of memory... Plus, for the last one, you can use perl's binary or, and etc, whereas for the @list and $bitstring, you'll have to emulate the mathematical functions (using the abovementioned substr, vec etc.)
| [reply] [d/l] [select] |
|
You said:
I am working on an encryption system where each ASCII
char is assigned a certain number of bits, so for example,
if the text to be encrypted is 1000 bytes, then after
encryption that text will be converted
to 36000 bytes consisting of just 0's and 1's.
Is it the case that the encryption system requires access
to the entire data stream in order to work
at all? If encrypting, say, 10 sets of 100 bytes (producing
10 sets of 3600 bytes) works as
well as cranking a lump of 1000 bytes into 36000,
then you should just read,
process and output a small portion of data at a time, rather
than trying to hold an entire file -- with massive amounts
of wasted bits -- in memory at one time.
Apart from that -- I'm sorry but... -- if memory
consumption is an issue, and forcing some particular method
of bit padding is a requirement, I'd use C rather than Perl.
update: Maybe what you want is sysread,
to bring a stated number of bytes into an input scalar
variable; e.g.:
while ( $n_bytes_read = sysread( FILE, $inpbuf, 32 ) > 0 )
{
if ( $n < 32 ) { # must be the last chunk
# ... maybe this needs special treatment
}
process_input_bytes( $inpbuf );
}
| [reply] [d/l] |
Re: Eating RAM problem
by Abigail-II (Bishop) on Aug 01, 2002 at 17:54 UTC
|
Your problem isn't so much the file size, your problem is
that you want to make an array element for every single
character. This is Perl, not C, so this is going to be
costly - you'll get the overhead of a "Perl value" for each
character.
Do you really need that? Can't you use substr? Do you have
to have all the characters of the file at the same time?
Isn't the encryption/decryption algorithm made such that
it encrypts/decrypts blocks of some decent size?
Abigail | [reply] |
|
Well yes, the algorhythm does work on blocks. I was using an anology for every byte of a 2 MB file, because that would still be the realistic equivilent of 32 byte size blocks of a 64 MB file. I guess I could put a max-restriction on the data that can be entered to encrypt.lol
The point is is that I HAVE to have each chunk of 32 chars from the file to work with...whether I am using an array or not. How how can I do something like this using substr as you suggest, or for that matter, ANY way without draining RAM!lol
sub get_data{
my(@chunks_of_32);
@chunks_of_32 = ();
open(FILE, $file) or die;
while(<FILE>){
push @chunks_of_32, /\d{32}/og;
}
close(FILE);
}
Thanks | [reply] [d/l] |
|
| [reply] |
Re: Eating RAM problem
by chromatic (Archbishop) on Aug 01, 2002 at 17:43 UTC
|
sub by_string
{
my $file = shift;
local *IN, $/;
open( IN, $file ) or die "Cannot open '$file': $!";
return <IN>;
}
Access each element with substr. Memory savings? Several bytes per character, because Perl doesn't have to create a new SV for each character. | [reply] [d/l] |
|
Chromatic, I need to get each element into an array. If the file size is 2 million bytes, then the array should have 2 million elements. How can I do that without draining RAM? Using your sub I get:
@blah = by_string($file);
$i = 0;
foreach(@blah){
$i ++;
}
print $i; # Prints 1... not what I am looking for
Thanks | [reply] [d/l] |
|
| [reply] [d/l] |
|
Why do you need to get each element into an array?
Could you use Tie::VecArray; ?
| [reply] [d/l] |
Re: Eating RAM problem
by Cine (Friar) on Aug 01, 2002 at 18:13 UTC
|
sub test_loop_4 {
####### Uses a lot less RAM, but still a lot, because there are 2mil
+ elems in @all2...
####### A wild guess would be about 20-25*filesize in ramusage
open(FILE, $file) or die $!;
my $buf = '';
my @all2 = ();
while(read FILE,$buf,1) {
push @all2,$buf;
}
}
T
I
M
T
O
W
T
D
I | [reply] [d/l] |
|
Thanks Cine, but your example still uses 120 MB of RAM. With everybodies input, I now realize WHY RAM is being eaten alive.lol I gotta work on a buffer-scheme or multiple reads/writes from the file. Anybody comes up with anything, please post!
Thanks,
David
| [reply] |
|
It is quite difficult to come up with a caching scheme for a usage pattern that is unknown ;)
I suggest you make a new question where you state what you need.
T
I
M
T
O
W
T
D
I
| [reply] |
|
|