Extracting data from nested tgz files use Archive::Tar

jhuijsing has asked for the wisdom of the Perl Monks concerning the following question:

I have a tgz which contains mulitple tgz files and I want to extract some data from bb_log and cc_log. I don't want to keep copies of any the files.
Is this possible with Archive::Tar and do it all in memory.
ps: the a.tgz file is about 40M

   a.tgz   which contains 
     a.tar    which contains
        bb.tgz   
          bb.tar
            bb_log
            bb_data.tar 
        cc.tgz 
          cc.tar
            cc_log
            cc_data.tar
[download]

Comment on Extracting data from nested tgz files use Archive::Tar Download Code

Replies are listed 'Best First'.
Re: Extracting data from nested tgz files use Archive::Tar by Loops (Curate) on Nov 18, 2014 at 06:07 UTC
Proof of concept that only works with the exact nesting and file names you specified: use Archive::Tar; use IO::Uncompress::Gunzip; use IO::String; # Do whatever you want with each file and its data sub handle { my ($name,$data) = @_; print "file $name length ", length($data), $/; } my $filename = 'a.tgz'; my $outer = Archive::Tar->new($filename); for my $outerfile ($outer->get_files) { my $outerdata = $outer->get_content($outerfile->name); my $inner = Archive::Tar->new( IO::Uncompress::Gunzip->new( IO::String->new($outerdata))); for my $file ($inner->get_files) { next unless $file->name =~ /^.._log$/; handle $file->name, $inner->get_content($file->name); }; } [download] It works, but will likely be dog slow. Since your archive is only 40M maybe it doesn't matter.	[reply] [d/l]
Re^2: Extracting data from nested tgz files use Archive::Tar by jhuijsing (Acolyte) on Nov 20, 2014 at 02:58 UTC
No good get out of memory error. Back to the drawing board and use a temp directory to extract the second level tar files.	[reply]
Re: Extracting data from nested tgz files use Archive::Tar by GotToBTru (Prior) on Nov 18, 2014 at 05:44 UTC
If you open the a.tgz using `Archive::Tar->new()`, it appears you could use `get_content` to retrieve the contents of the internal tar, and then use `Archive::Tar->new(data=>...)` to open it. Repeat this to access the files further in. I haven't actually done this, but it looks possible based on the docs. It could potentially take up a great deal of memory. 1 Peter 4:10	[reply] [d/l] [select]


more useful options
	PerlMonks