Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Extracting data from nested tgz files use Archive::Tar

by jhuijsing (Acolyte)
on Nov 18, 2014 at 00:46 UTC ( [id://1107509]=perlquestion: print w/replies, xml ) Need Help??

jhuijsing has asked for the wisdom of the Perl Monks concerning the following question:

I have a tgz which contains mulitple tgz files and I want to extract some data from bb_log and cc_log. I don't want to keep copies of any the files.
Is this possible with Archive::Tar and do it all in memory.
ps: the a.tgz file is about 40M
a.tgz which contains a.tar which contains bb.tgz bb.tar bb_log bb_data.tar cc.tgz cc.tar cc_log cc_data.tar

Replies are listed 'Best First'.
Re: Extracting data from nested tgz files use Archive::Tar
by Loops (Curate) on Nov 18, 2014 at 06:07 UTC

    Proof of concept that only works with the exact nesting and file names you specified:

    use Archive::Tar; use IO::Uncompress::Gunzip; use IO::String; # Do whatever you want with each file and its data sub handle { my ($name,$data) = @_; print "file $name length ", length($data), $/; } my $filename = 'a.tgz'; my $outer = Archive::Tar->new($filename); for my $outerfile ($outer->get_files) { my $outerdata = $outer->get_content($outerfile->name); my $inner = Archive::Tar->new( IO::Uncompress::Gunzip->new( IO::String->new($outerdata))); for my $file ($inner->get_files) { next unless $file->name =~ /^.._log$/; handle $file->name, $inner->get_content($file->name); }; }

    It works, but will likely be dog slow. Since your archive is only 40M maybe it doesn't matter.

      No good get out of memory error.

      Back to the drawing board and use a temp directory to extract the second level tar files.

Re: Extracting data from nested tgz files use Archive::Tar
by GotToBTru (Prior) on Nov 18, 2014 at 05:44 UTC

    If you open the a.tgz using Archive::Tar->new(), it appears you could use get_content to retrieve the contents of the internal tar, and then use Archive::Tar->new(data=>...) to open it. Repeat this to access the files further in. I haven't actually done this, but it looks possible based on the docs. It could potentially take up a great deal of memory.

    1 Peter 4:10

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1107509]
Approved by Athanasius
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (6)
As of 2024-04-20 00:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found