Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re^3: Uncompress streaming gzip on the fly in LWP::UserAgent/WWW::Mechanize

by bliako (Monsignor)
on Dec 19, 2018 at 14:37 UTC ( [id://1227466]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Uncompress streaming gzip on the fly in LWP::UserAgent/WWW::Mechanize
in thread Solved: Uncompress streaming gzip on the fly in LWP::UserAgent/WWW::Mechanize

I would investigate what "gibberish" is, and whether Gunzip fails on that data or it does uncompresses it but what you get is "gibberish". If gunzip does not fail then it is possible to be sometimes have zip-inside-zip.

So, they have a logical chunk of data, based on the XML I saw in their page <quote>...</quote> and then they have a logical chunk of compressed data of 32kB? Isn't that weird? I mean they compress 5 chunks of data and sometimes it is 32kB and sometimes it is 33kB depending what content they have. How can they always send 32kB and expect the recipient to get exactly 5 chunks of data? Unless they send sometimes 4 chunks, sometimes 5 and most times something fractional in-between. And if they do send something fractional, isn't it weird to cause you to waste time waiting for the remaining half chunk to appear (whenever the 32kB limit of the next chunk is filled)? You have something like "IBM up 2<end of chunk sorry>" and then you wait a few valuable seconds for the next chunk to find out if it is 2000 points or 2.4 points up!

They can also do it if they pad of course but what's the point for all this computational burden on their side and forcing the client to wait till 32kB of compressed data have been completed before knowing where the market goes?

Just thinking out loud...

  • Comment on Re^3: Uncompress streaming gzip on the fly in LWP::UserAgent/WWW::Mechanize
  • Download Code

Replies are listed 'Best First'.
Re^4: Uncompress streaming gzip on the fly in LWP::UserAgent/WWW::Mechanize
by Your Mother (Archbishop) on Dec 19, 2018 at 16:41 UTC

    Thanks again for thinking about it at all, out loud or otherwise. :P

    The 32kB is just something I saw somewhere about gzip streams. I don't remember where, I probably shouldn't have mentioned it.

    If I do this (assume proper var scoping)–

    gunzip \$data => \$out; print $out, $/;

    –it will display something like–

    <status>connected</status> ?R??0 ????l??????@? +U?&#1964;??/?%y???p???v?Po#[???-???x? >\'&#1000;??4'?V.6?6?&#1444;~5Y???0???C]?$?@m~OgQ?u&#451;8?Y?E?8<?Le?4 +?6??&#1644;&qd?x#1

    Amended to–

    $collected .= $data; gunzip \$collected, \$out; print $out, $/;

    We get (it's ignoring the Accept and returning XML)–

    <status>connected</status> <quote> <ask>166.29</ask> <asksz>500</asksz> <bid>166.26</bid> … </quote> ...

    And then dies after awhile, it's inconsistent where but never sooner than 5kB in, with an "unexpected end" style message.

    Adding this lets it run for—maybe, I didn't let it run that long—forever, but it's still stacking up an ever growing scalar and gunzipping the same data over and over–

    $collected .= $data; gunzip \$collected, \$out, MultiStream => 1; print $out, $/;

    I expect I will have to come up with seek/tell/truncate kind of solution to keep the data from growing forever that uses the MultiStream to reset itself automatically. Haven't had time to go back to it. I feel like this must be a solved problem and I'm just looking in the wrong place. :|

      If you have concatenated the chunks received & could uncompress the composite buffer, it sounds like the sub that gets triggered in the add_header is being passed a part of the same gzipped data stream every time it is invoked. You can push the compressed data a buffer at a time through Compress::Zlib. Something like this
      use Compress::Zlib; my $gunzip = inflateInit(WindowBits => 16 + MAX_WBITS) or die "Cannot create a inflation stream\n" ; ... $mech->add_handler( response_data => sub { my ( $response, $ua, $h, $data ) = @_; my ($buffer, $status) = $gunzip->inflate($data); # uncompressed data in $buffer # return true to get called again for same response. 1; } ;

      Sorry if I completely misunderstood the problem, but won't the following work?

      use strict; use warnings; use feature 'say'; use IO::Compress::Gzip 'gzip'; use IO::Uncompress::Gunzip qw/ gunzip $GunzipError /; my $s = <<'END'; I include only the bare bones because I tried something like 20 differ +ent things without success and I'm embarrassed. :( Non-streaming requ +ests are working perfectly with approximately this code. The endpoint + for this code is a END gzip( \$s, \my $c ); my @chunks = unpack '(a42)*', $c x 5; my $partial = ''; my $result = ''; my $n = 1; for ( @chunks ) { gunzip( \( $partial . $_ ), \my $o, Transparent => 0, TrailingData + => my $t ); $partial .= $_ and next if $GunzipError; $partial = $t ? $t : ''; print "message #", $n ++, "\n$o"; }

        It depends. The code in your test script assumes that the input consists of 5 completely distinct gzip data streams. So they will each contain the gzip header, the compressed payload and gzip trailer data. If that is what is actually happening with the WWW::Mechanize application, and the gzip data streams aren't that big, then your approach should be fine.

        I'm not convinced that is what is happening in the real application though. The snippet of code below, from earlier, along with the observation that uncompressing $collected resulted in more of the real uncompresed payload data suggests that this is a single gzip data stream

        $collected .= $data; gunzip \$collected, \$out; print $out, $/;

        If that is the case then IO::Uncompress::Gunzip will only work if you are prepared to read the entire compressed data stream and uncompress the lot in one go. If we are dealing with a potentially infinite compressed data stream, that isn't going to work.

        The code I posted that uses Compress::Zlib will uncompresses the data as it gets it, one chunk at a time.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1227466]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (None)
    As of 2024-04-18 23:42 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      No recent polls found