Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Archive::Extract - Out of Memory

by cmilfo (Hermit)
on Nov 14, 2011 at 20:25 UTC ( [id://938018]=perlquestion: print w/replies, xml ) Need Help??

cmilfo has asked for the wisdom of the Perl Monks concerning the following question:

Before I open a ticket for Archive::Extract, I was wondering if anyone could double-check me.

When using Archive::Extract to extract large gzipped files (.gz), I'm receiving an "Out of Memory" error. (Note: I'm setting $Archive::Extract::PREFER_BIN to 1 to prefer using system binaries.)

Example code:

use Archive::Extract; $Archive::Extract::PREFER_BIN = 1; my $ae = Archive::Extract->new( archive => 'big.txt.gz' ); $ae->extract( to => 'big.txt');

Looking at the module, it looks as if it runs gzip with '-c' (write to STDOUT) and captures the output in a buffer. The buffer is then written to a filehandle. Here's where I need the double-check.

sub _gunzip_bin { my $self = shift; ### check for /bin/gzip -- we need it ### unless( $self->bin_gzip ) { $self->_error(loc("No '%1' program found", '/bin/gzip')); return METHOD_NA; } my $fh = FileHandle->new('>'. $self->_gunzip_to) or return $self->_error(loc("Could not open '%1' for writing: %2" +, $self->_gunzip_to, $! )); my $cmd = [ $self->bin_gzip, '-cdf', $self->archive ]; my $buffer; unless( scalar run( command => $cmd, verbose => $DEBUG, buffer => \$buffer ) ) { return $self->_error(loc("Unable to gunzip '%1': %2", $self->archive, $buffer)); } ### no buffers available? if( !IPC::Cmd->can_capture_buffer and !$buffer ) { $self->_error( $self->_no_buffer_content( $self->archive ) ); } $self->_print($fh, $buffer) if defined $buffer; close $fh; ### set what files where extract, and where they went ### $self->files( [$self->_gunzip_to] ); $self->extract_path( File::Spec->rel2abs(cwd()) ); return 1; }

As far as I can tell, it's not happening with the other methods of compression. It looks like gzipped files are the only ones handled this way.

Thank you!
Casey

Replies are listed 'Best First'.
Re: Archive::Extract - Out of Memory
by Anonymous Monk on Nov 14, 2011 at 23:11 UTC

    Note: I'm setting $Archive::Extract::PREFER_BIN to 1 to prefer using system binaries.

    Try setting $Archive::Extract::DEBUG = 99; to see if they're actually used, instead of merely preferred

    But yes, it seems like a bug to gzip to stdout when the issue is memory usage

Re: Archive::Extract - Out of Memory
by remiah (Hermit) on Nov 15, 2011 at 00:08 UTC
    On my FreeBSD 8.2, perl 5.12.2, it was terminated saying "killed :9". top command shows memory growing and growing.
    #create test 48kb gz perl -e 'print ("xxxxx" x 100000000) '|gzip > test.gz
    #perl $Archive::Extract::PREFER_BIN = 1; $ae = Archive::Extract->new( archive => 'test.gz' ); $ae->extract( to => 'test.txt'); print "end\n";

    I guess you already know this, but when PREFFER_BIN is not set, Archive::Extract says "You do not have 'Compress::Zlib' installed - Please install it as soon as possible. at tmp.pl line 10". With Compress::Zlib, you go to "_gunzip_cz" instead of "_gunzip_bin". _gunzip_cz works fine with Compress::Zlib.

    As you pointed out, without Compress::Zlib, it calls _gunzip_bin and executes "/usr/bin/gzip -cdf /tmp/test.gz" through IPC::Cmd' run function and put the stdout to $buffer.

    I am not sure what is the reason to catch results through STDOUT. Does redirection like

    my $cmd = [ $self->bin_gzip, '-cdf', $self->archive ,'>' ,$self->_gunz +ip_to ];
    have some trouble when you think of portability, I wonder?

Re: Archive::Extract - Out of Memory
by Anonymous Monk on Nov 14, 2011 at 23:48 UTC
    also take a look at the new Archive::Extract::Libarchive: http://blogs.perl.org/users/acme/2011/10/extracting-your-archives.html

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://938018]
Approved by keszler
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (3)
As of 2024-04-25 19:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found