Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re^5: Uncompress streaming gzip on the fly in LWP::UserAgent/WWW::Mechanize

by vr (Curate)
on Dec 19, 2018 at 19:32 UTC ( [id://1227483]=note: print w/replies, xml ) Need Help??


in reply to Re^4: Uncompress streaming gzip on the fly in LWP::UserAgent/WWW::Mechanize
in thread Solved: Uncompress streaming gzip on the fly in LWP::UserAgent/WWW::Mechanize

Sorry if I completely misunderstood the problem, but won't the following work?

use strict; use warnings; use feature 'say'; use IO::Compress::Gzip 'gzip'; use IO::Uncompress::Gunzip qw/ gunzip $GunzipError /; my $s = <<'END'; I include only the bare bones because I tried something like 20 differ +ent things without success and I'm embarrassed. :( Non-streaming requ +ests are working perfectly with approximately this code. The endpoint + for this code is a END gzip( \$s, \my $c ); my @chunks = unpack '(a42)*', $c x 5; my $partial = ''; my $result = ''; my $n = 1; for ( @chunks ) { gunzip( \( $partial . $_ ), \my $o, Transparent => 0, TrailingData + => my $t ); $partial .= $_ and next if $GunzipError; $partial = $t ? $t : ''; print "message #", $n ++, "\n$o"; }
  • Comment on Re^5: Uncompress streaming gzip on the fly in LWP::UserAgent/WWW::Mechanize
  • Download Code

Replies are listed 'Best First'.
Re^6: Uncompress streaming gzip on the fly in LWP::UserAgent/WWW::Mechanize
by pmqs (Friar) on Dec 19, 2018 at 22:16 UTC

    It depends. The code in your test script assumes that the input consists of 5 completely distinct gzip data streams. So they will each contain the gzip header, the compressed payload and gzip trailer data. If that is what is actually happening with the WWW::Mechanize application, and the gzip data streams aren't that big, then your approach should be fine.

    I'm not convinced that is what is happening in the real application though. The snippet of code below, from earlier, along with the observation that uncompressing $collected resulted in more of the real uncompresed payload data suggests that this is a single gzip data stream

    $collected .= $data; gunzip \$collected, \$out; print $out, $/;

    If that is the case then IO::Uncompress::Gunzip will only work if you are prepared to read the entire compressed data stream and uncompress the lot in one go. If we are dealing with a potentially infinite compressed data stream, that isn't going to work.

    The code I posted that uses Compress::Zlib will uncompresses the data as it gets it, one chunk at a time.

      I see, thanks. Can you explain, when it's possible that the output of "gunzip" is valid (but partial, truncated) uncompressed data plus obviously binary, still compressed "tail", as in Re^4: Uncompress streaming gzip on the fly in LWP::UserAgent/WWW::Mechanize. I couldn't get such result regardless of "Transparent" and all other parameters -- always uncompressed partial data only, instead.

        IO::Uncompress::Gunzip expects to be given valid and complete gzip data stream. If that doesn't happen, it fails.

        I'll use the compressed data in $gzipped below to illustrate.

        use IO::Compress::Gzip qw(gzip); use IO::Uncompress::Gunzip qw(gunzip $GunzipError); my $data = 'I include only the bare bones because I tried somethin +g'; # Create some compressed data my $gzipped ; gzip \$data => \$gzipped ;

        Lets start with the valid part. If I create data corruption in the compressed data stream, bad things happen

        my $corrupt = $gzipped; # Overwrite part of the compressed data with junk substr($corrupt, 10, 3, "BAD") ; gunzip \$corrupt => \$uncompressed or print "Cannot gunzip: $GunzipError\n";

        That will output

        Cannot gunzip: Inflation Error: data error

        If you get that, there is no point in continuing.

        Next is a truncated data stream (which is what this ticket is all about).

        # truncate the compressed data my $truncated = substr($gzipped, 0, 10); gunzip \$truncated => \$uncompressed or print "Cannot gunzip: $GunzipError\n";

        that will output this

        Cannot gunzip: unexpected end of file

        In this instance, you can try to get more data, append to the input buffer ($truncated in this case) and uncompress the whole thing again. The only semi-valid use for this technique is when you are certain that you will eventually get a complete gzip data stream. That does not seem to be the case in this instance.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1227483]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2024-04-25 14:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found