Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

LWP slow downloads on windows

by robobunny (Friar)
on Dec 03, 2008 at 18:42 UTC ( [id://727775]=perlquestion: print w/replies, xml ) Need Help??

robobunny has asked for the wisdom of the Perl Monks concerning the following question:

I have a program that downloads a ~23MB file from a web server on the LAN using LWP. It runs nearly instantly on various UNIX flavors, but takes over 2 minutes on Windows using ActivePerl 5.8.8. I'm using LWP 5.808 on Linux and 5.814 on Windows, but I don't think the version difference is the issue. Does anyone have any ideas about why this is so much slower on Windows? Here's a test script and the output on both Windows and Linux: Linux:
Downloaded succeeded NAME TIME CUMULATIVE PERCENTAGE Instantiate Useragent 0.000 0.000 0.002% Download file 2.400 2.400 99.991% _stop_ 0.000 2.400 0.007%
Windows:
Downloaded succeeded NAME TIME CUMULATIVE PERCENTAGE Instantiate Useragent 0.000 0.000 0.000% Download file 133.377 133.377 99.988% _stop_ 0.016 133.392 0.012%
use strict; use warnings; use LWP; use Benchmark::Stopwatch; my $url = 'http://hermes/datasquid/static.dbm'; my $stopwatch = Benchmark::Stopwatch->new->start; my $ua = LWP::UserAgent->new; $stopwatch->lap('Instantiate Useragent'); my $response = $ua->get( $url ); $stopwatch->lap('Download file'); if($response->is_success) { print "Downloaded succeeded\n"; } else { print "Download failed\n"; } print $stopwatch->stop->summary;

Replies are listed 'Best First'.
Re: LWP slow downloads on windows
by NetWallah (Canon) on Dec 03, 2008 at 20:36 UTC
    This may be a TCP issue. This issue manifests itself on high-bandwidth + high latency links.

    The Microsoft KB article # 224829 incorrectly states that "Timestamps and Window scaling are enabled by default..".

    What actually happens is that when a Windows machine initiates a TCP connection, it does NOT advertise "Window scaling" by default.

    This must be manually enabled by setting a registry entry, then rebooting:

    (Save these lines to a file - suggested name: TCPIP-RFC1323.reg. This will allow error-free, consistent update of several machines)

    Windows Registry Editor Version 5.00 [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parame +ters] "Tcp1323Opts"=dword:00000003
    This requires a reboot, in order to take effect.

    Please refer to Microsoft KB 224829 for an explanation of this entry, as well as information on TCP Window Scaling.

         Have you been high today? I see the nuns are gay! My brother yelled to me...I love you inside Ed - Benny Lava, by Buffalax

      Thanks for the info. That doesn't seem to be the problem. I added the registry key and rebooted, and I'm getting about the same download time.
Re: LWP slow downloads on windows
by jbert (Priest) on Dec 03, 2008 at 19:59 UTC
    Simple things first. I think LWP pulls the response into memory. 23Mbyte isn't that much, but is there a chance that your windows box is working near the limit of it's RAM? (Check the 'commit change' in task manager - if its close to your physical RAM then you probably are close to paging).

    If so, you could simply be seeing paging activity account for the difference.

    Other than that, can you try other methods? Perhaps the network to your windows box is congested?

    Does your web browser take a similar length of time to fetch the file?

    It still could be LWP, but it would be useful if you could check the windows box is able to fetch the 23Mbyte quickly using other methods.

      Also, you can compare the ping times and traceroute outputs.
      Thanks for the response. I'm pretty sure it's LWP. Downloading the file manually with a browser is fast, and I've seen the same behavior on every Windows machine that I've tried. I've also seen it on a number of machines that I'm sure had plenty of free RAM, so I don't think it's paging. My Windows and Linux machines are plugged into the same switch as well, so there shouldn't be any network differences.

      It isn't that big of a deal in this case because the file isn't downloaded often (mostly just the first time a user runs the program), but it's been annoying today because I'm working on the download code and it's slowing testing down. The main reason I'd like to know what's going on is in case it becomes an issue in the future.

        OK, can we get an idea what your windows box is up to during this 2 minutes?

        Is the CPU busy at all (I assume it's not busy before you start the test)?

        If the CPU is busy with another proc, that's your culprit. If the CPU is busy with your perl script, we can try CPU profiling with Devel::NYTProf or Devel::Profile, if (as is likely) your CPU is idle we can look at other causes:

        Perhaps you could try:

        1. take a network trace with wireshark.

          What you're looking at here is for significant pauses between the packets. 23 Mbyte is ~ 12000 1500-byte packets. Over 133 secs the gap we're looking for is about 133/12000 ~= 0.01 secs. If the pause is between the data packet going out and the ACK coming back, the problem is at the server (it's pausing before ACKing for some reason). If the pause is between an ACK coming back and data going out, the problem is local.

        2. Using LWP::UserAgent

          This allows you to set a callback (see the get function in the doc) which is called whenever data is available. You could add these to your stopwatch to see if rate is uniformly slow or slows down over time. Not sure what that would tell us, but you never know.

        Also, since we're in the realm of the unlikely here, maybe it is worth checking it's not the LWP version. LWP is pure-perl, so you should be able to copy the modules from one box to another and use that temp copy with 'perl -I /path/to/your/temp/copy'. Maybe you've found a regression.

        A quick grep over the current LWP source shows lots of references to $^O, but nothing which seems relevant.


        D'oh! I forgot the 1st rule of network delays: it's always DNS.

        I'll wager one xp point that your windows boxes don't have reverse DNS, and your web server is (mis-)configured to log the host name of connecting systems. It sits there in DNS timeout before sending back the data.

        If I'm right, you'll see no data at all during the timeout period, then the data shoot back in a short while. Can you 'dig -x 1.2.3.4' from your web server for the windows boxes and the unix boxes?

Re: LWP slow downloads on windows (50 times faster?)
by BrowserUk (Patriarch) on Dec 04, 2008 at 02:27 UTC

    This probably won't go down well with some, but mod:;//LWP seems to be badly broken on Win32. The following code downloads the same file from CPAN using 3 different methods; LWP::Simple::get(), LWP::UserAgent::get(), and Win32::Internet::FetchURL():

    #! perl -slw use strict; use Time::HiRes qw[ time ]; use LWP; use LWP::Simple; use Win32::Internet; $|++; my $url = 'http://www.mirrorservice.org/sites/ftp.funet.fi/pub/languag +es/perl/CPAN/authors/id/J/JD/JDB/Win32-Internet-0.084.tar.gz'; my $start1 = time; my $file1 = get( $url ); printf "\nLWP::Simple took %7.3f seconds\n", my $time1 = time() - $sta +rt1; print "Size: ", length $file1; printf "Transfer rate: %5.2f bytes/sec\n", length( $file1 ) / $time1; my $inet = new Win32::Internet; my $start2 = time; my $file2 = $inet->FetchURL( $url ); printf "\nWin32::I took %7.3f seconds\n", my $time2 = time() - $start2 +; print "Size: ", length $file2; printf "Transfer rate: %5.2f bytes/sec\n", length( $file2 ) / $time2; my $ua = LWP::UserAgent->new; my $start3 = time; my $resp = $ua->get( $url ); printf "\nLWP::UA took %7.3f seconds\n", my $time3 = time() - $start3; + ## Corrected [Mr Mischief]++ my $file3 = $resp->content; print "Size: ", length $file3; printf "Transfer rate: %5.2f bytes/sec\n", length( $file3 ) / $time3;

    The upshot is that LWP::Simple::get() was twice as fast and LWP::UserAgent::get(). But the native API wrapper is 25 times faster than LWP::Simple & LWP::UserAgent!

    I could not believe the difference using the native API made--it HAD to be an error! Didn't it?-- but I've run this a dozen times now. I tried altering the ordering of the downloads to check that there was no caching involved; I re-booted and killed every process except those required to allow the system to run. And the following figures are pretty average for the results I've seen:

    Uodated: figures for corrected benchmark. Mr Mischief++

    C:\test>Win32Itest.pl LWP::Simple took 14.703 seconds Size: 64230 Transfer rate: 4368.46 bytes/sec Win32::I took 0.639 seconds Size: 64230 Transfer rate: 100515.50 bytes/sec LWP::UA took 14.796 seconds Size: 64230 Transfer rate: 4340.89 bytes/sec

    Can anyone out there confirm these figures? Someone with a decent speed internet connection?

    So, I tried wget:

    C:\test>wget http://www.mirrorservice.org/sites/ftp.funet.fi/pub/langu +ages/perl/CPAN/authors/id/J/JD/JDB/Win32-Internet-0.084.tar.gz --02:17:39-- http://www.mirrorservice.org/sites/ftp.funet.fi/pub/lang +uages/perl/CPAN/authors/id/J/JD/JDB/Win32-Internet-0.084.tar.gz => `Win32-Internet-0.084.tar.gz' Resolving www.mirrorservice.org... 212.219.56.138, 212.219.56.139, 212 +.219.56.153, ... Connecting to www.mirrorservice.org|212.219.56.138|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 64,230 (63K) [application/x-gzip] 100%[==============...==============>] 64,230 4.51K/s ETA 0 +0:00 02:17:54 (4.60 KB/s) - `Win32-Internet-0.084.tar.gz' saved [64230/6423 +0]

    Conclusion: if you use win32, and regularly download large files, seriously consider using the Win32::Internet module, because unless someone can explain my mistake, it really is remarkably quick.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      your example returns
      LWP::Simple took 1.594 seconds Size: 64230 Transfer rate: 40303.76 bytes/sec Win32::I took 4.489 seconds Size: 64230 Transfer rate: 14308.64 bytes/sec LWP::UA took 15.613 seconds Size: 64230 Transfer rate: 4113.83 bytes/sec
      with perl 5.8.8.822 at win2k3 server

        Thank you Anonymonk.

        You at least confirmed the potential for wild variation--even if it didn't quite match my empirical evidence.

        There is something going on here that greatly (order of magnitude greatly) affects the performance of sockets, and LWP::UserAgent seems to be the loser.

        (I realise that 2 out of 2 is hardly a significant sample, but still, the trend is promising for my premise :)


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
      Callbacks are one reason UA is slower, winxp home, perl v5.8.7, LWP 5.819 (libwww-perl-5.821), GNU Wget 1.11.4
      D:\>perl lwp-bench.pl LWP::Simple took 2.376 seconds Size: 64230 Transfer rate: 27036.45 bytes/sec Win32::I took 1.358 seconds Size: 64230 Transfer rate: 47312.34 bytes/sec LWP::UA took 2.717 seconds Size: 64230 Transfer rate: 23640.49 bytes/sec D:\>perl lwp-bench.pl LWP::Simple took 1.327 seconds Size: 64230 Transfer rate: 48385.75 bytes/sec Win32::I took 0.396 seconds Size: 64230 Transfer rate: 162095.51 bytes/sec LWP::UA took 1.878 seconds Size: 64230 Transfer rate: 34205.63 bytes/sec D:\>perl lwp-bench.pl LWP::Simple took 1.457 seconds Size: 64230 Transfer rate: 44091.30 bytes/sec Win32::I took 0.535 seconds Size: 64230 Transfer rate: 120084.79 bytes/sec LWP::UA took 2.035 seconds Size: 64230 Transfer rate: 31564.64 bytes/sec D:\> D:\>wget -c http://www.mirrorservice.org/sites/ftp.funet.fi/pub/langua +ges/perl/CPAN/authors/id/J/JD/JDB/Win32-In ternet-0.084.tar.gz --2008-12-03 23:29:27-- http://www.mirrorservice.org/sites/ftp.funet. +fi/pub/languages/perl/CPAN/authors/id/J/JD/JDB/Win 32-Internet-0.084.tar.gz Resolving www.mirrorservice.org... 212.219.56.135, 212.219.56.138, 212 +.219.56.139, ... Connecting to www.mirrorservice.org|212.219.56.135|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 64230 (63K) [application/x-gzip] Saving to: `Win32-Internet-0.084.tar.gz' 100%[================================================================= +=============>] 64,230 68.5K/s in 0.9s 2008-12-03 23:29:29 (68.5 KB/s) - `Win32-Internet-0.084.tar.gz' saved +[64230/64230] D:\>rm Win32-Internet-0.084.tar.gz D:\>wget -c http://www.mirrorservice.org/sites/ftp.funet.fi/pub/langua +ges/perl/CPAN/authors/id/J/JD/JDB/Win32-In ternet-0.084.tar.gz --2008-12-03 23:29:38-- http://www.mirrorservice.org/sites/ftp.funet. +fi/pub/languages/perl/CPAN/authors/id/J/JD/JDB/Win 32-Internet-0.084.tar.gz Resolving www.mirrorservice.org... 212.219.56.135, 212.219.56.138, 212 +.219.56.139, ... Connecting to www.mirrorservice.org|212.219.56.135|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 64230 (63K) [application/x-gzip] Saving to: `Win32-Internet-0.084.tar.gz' 100%[================================================================= +=============>] 64,230 59.2K/s in 1.1s 2008-12-03 23:29:39 (59.2 KB/s) - `Win32-Internet-0.084.tar.gz' saved +[64230/64230] D:\>rm Win32-Internet-0.084.tar.gz D:\>
      maybe you can try LWP::Curl next?
      I saw roughly the same speed differences as everyone else, both using the external URL in your script and using the internal file that I started with. After running a profiler on LWP, it looks like most of the time is spent appending the new chunk it just downloaded to the existing buffer (HTTP::Message, add_content method, line 142 in version 5.814):
      $self->{_content} .= $$chunkref;
      add_content is called in a loop by the collect method of LWP::Protocol. I was able to reduce the download time for a 12MB file from 36 seconds to 5 seconds by changing the collect method to use an array for the buffer and then stick everything together at the end, instead of calling add_content repeatedly:
      my @total_content; if (!defined($arg) || !$response->is_success) { # scalar while ($content = &$collector, length $$content) { if ($parser) { $parser->parse($$content) or undef($parser); } LWP::Debug::debug("read " . length($$content) . " bytes"); push(@total_content, $$content); $content_size += length($$content); $ua->progress(($length ? ($content_size / $length) : "tick"), +$response); if (defined($max_size) && $content_size > $max_size) { LWP::Debug::debug("Aborting because size limit exceeded"); $response->push_header("Client-Aborted", "max_size"); last; } } $response->add_content(join('', @total_content));
      Anybody know any reason that would be a problem?
        Your changes indeed speed things up on my testing system so long as they don't cause additional swapping. I see only one issue. It does seem to use more memory than the stock LWP::Protocol.

        Yours hit more heavily into my swap file for a large file (703MB on a system with only 512 MB of RAM) than the stock version. The map in BrowserUk's version put my poor antiquated testing server right out of physical RAM and swap completely.

        My conclusion is that I really do need more RAM in this machine. It's something to consider if you're under similar constraints or if you're submitting a patch to the module's maintainers, though. As always, paging to swap can erase any gains you'd get otherwise and then cause even much slower execution. Sometimes optimizing for speed at the expense of memory fails, and sometimes optimizing for memory use is actually a performance win.

        This helped greatly speeding up zip downloads when I added this fix to the collect method in LWP::Protocal under perl 5.825. Before the fix it took ~3.5h to download 28 zip files and after adding this LWP::Protocal fix it took only ~1.5h :-)

        However, now I have perl 5.8.9 (with latest libwww-perl 5.825 2009-02-16) installed *and* the zip downloads are back to taking ~3.5 hours.

        I diffed the Protocal.pm files but they are redical different (between libwww-perl 5.825 2009 and Protocol.pm,v 1.41 2003/10/23 19:11:32) and I don't see any hooks for a similar quick fix.

        Can anyone suggest a fix for speeding up downloads using libwww-perl (5.825 2009-02-16)...?

      Your benchmark is charging the time for both Win32::Internet and LWP::UserAgent to the latter:

      printf "\nLWP::UA took %7.3f seconds\n", my $time3 = time() - $start2;

      should be:

      printf "\nLWP::UA took %7.3f seconds\n", my $time3 = time() - $start3;

Re: LWP slow downloads on windows
by tprocter (Sexton) on Dec 04, 2008 at 17:31 UTC

    I've done some work with HTTP requests, and I've had more success with IO::Socket::INET. This is a sub I use for various web client applications. It's consistently quick on UNIX and Windows.

    I ran this script on both Linux and Windows using perl 5.8.8, and in both cases it took 2 seconds to download 14M. (time does not include writing it to a file)

    package HTTP::Request; use IO::Socket::INET; use warnings; use strict; use Carp; our $VERSION = '1.000'; my ($content, $tags) = get_url( { webserver => 'www.example.com', url => '/', port => 80, verbose => 0, #Print the HTTP message sequence download => 1, #Mostly for crawler applications. Content type +s without #links are not retrieved if set to 0. }); if ($content) { print "Elapsed time: $tags->{elapsed}\n"; print "Downloaded page:\n$content\n"; } else { print "Content is not available\n"; } sub get_url { my ($parameter) = @_; my $submit; my $port = 80; my $content = ''; my %tag; my $t1 = time; if (!exists ($parameter->{webserver}) or !exists ($parameter->{url})) { croak "Missing webserver or URL information"; } if (defined ($parameter->{port})) { $port = $parameter->{port}; } my $webserver = $parameter->{webserver}; my $url = $parameter->{url}; my $sock = IO::Socket::INET->new( PeerAddr => $webserver, PeerPort => $port, Proto => 'tcp', Timeout => 10 ); my $line; my $new_location; my @output = (); my @headers = (); my $header; if ($sock) { $sock->autoflush(); $submit = <<"END_GET"; GET $url HTTP/1.0 Host: $webserver User-Agent: HTTPR/1.1 END_GET print $sock $submit; while ( $line = <$sock> ) { #separate loop to save processin +g on body $line =~ s/\s+$//; if ( $line =~ /^\s*$/ ) { last; } push @headers, $line; if ($line =~ /^Location: (.+)/) { $new_location = $1; } } $tag{proto} = $headers[0]; $tag{elapsed} = time - $t1; $tag{url} = $url; foreach my $stat (@headers[1 .. $#headers]) { $stat =~ /^(.+): (.*)$/; $tag{lc $1} = exists($tag{lc $1}) ? $tag{lc $1} . ', ' . + $2 : $2; } $headers[0] =~ /HTTP\/\d+\.\d+ (\d+)/m; my $http_status = $1; if ($http_status == 200 and ($tag{'content-type'} =~ /text|css|html/ or $parameter-> +{download})) { if (exists($tag{'content-length'})) { #optimized reading if available $sock->read($output[0], $tag{'content-length'}); } else { local $/; while ( $line = <$sock> ) { $line =~ s/\s+$//; push @output, $line; } } } if ($parameter->{verbose}) { print $submit . "\n"; print join ("\n", @headers) . "\n"; print "\n\n" . join ("\n ", @output); print "\n=============================================== +===\n"; } if ($http_status == 301) { print "Page is relocated to $new_location\n"; return; } elsif ($http_status != 200){ print "ERROR webserver could not process request!\n" . join ("\n ", @headers) . "\n"; return; } else { $content = join ('', @output); } } else { print "Could not connect to $webserver port 80.\n"; return; } return ($content, \%tag); }
Re: LWP slow downloads on windows
by Anonymous Monk on Dec 05, 2008 at 00:34 UTC
    Go for the network driver. Check old releases, too. See if there are differences in the performance by using different releases of network drivers.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://727775]
Approved by moritz
Front-paged by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (7)
As of 2024-04-18 07:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found