Re: LWP::Parallel vs. HTTP::GHTTP vs. IO::Socket

hacker

I can't comment on LWP::Parallel directly, but I did some crude benchmarks on LWP, HTTP::GHTTP, HTTP::Lite and HTTP::MHTTP. What I found was hardly surprising:

LWP is a big slow to load module, and once it's loaded, it's still pretty slow. It can do just about anything, but it's not a speed demon.
Lite is quicker than LWP to load, and quicker in use, but it's still not what you would call fast.
GHTTP as expected was fast to load, and fast in use. Much faster than either of the pure Perl modules. I can't get it to work under mod_Perl on Windows, but that's my only complaint.
MHTTP was the only surprise. It has the most basic API, it's not object orientated like the others, but it's even faster than GHTTP - in both module load time, and in actual use..

UPDATE: It should be possible to compile the two c based modules on Windows, GHTTP and MHTTP. I believe that currently only GHTTP has a precompiled PPM available. Building the module on Windows, is just a case of asking a nice person with a compiler to do the work for you - CrazyPPM repository, interested?. I've recently spoken with Piers, and if you have any bugs to submit for MHTTP, let him know and he'll have a look at the for you.

--
"It's not magic, it's work..."
ajt

Comment on Re: LWP::Parallel vs. HTTP::GHTTP vs. IO::Socket

Replies are listed 'Best First'.
Re: LWP::Parallel vs. HTTP::GHTTP vs. IO::Socket by hacker (Priest) on May 16, 2003 at 15:16 UTC
Along these lines, would it be faster to use LWP::Parallel, even though it is a bit heavier and slower, to fetch requests in parallel, or to use something like HTTP::MHTTP and fork() or Thread, and grab the requests from `@urls` one-at-a-time? My concern here is that I'll have an array and some hashes that have urls that are seen, unseen, down, bad, and so on.. and I need to make sure that the process putting urls into the hashes and arrays (as links are yanked from the pages in %seen) can be fetched by processes already in fork() or registered in parallel. Would this require some sort of shared memory to get working properly? Can a forked process read and write to an array or hash created by the parent of the fork? I've got a lot of this code "functioning", but now is the time to refactor and get the performance up to speed (pun intended) for a production distribution of the tool.	[reply] [d/l]
Re: Re: LWP::Parallel vs. HTTP::GHTTP vs. IO::Socket by mp (Deacon) on May 16, 2003 at 16:42 UTC
You can use Parallel::ForkManager to parallelize HTTP::MHTTP or HTTP::GHTTP calls easily and apply a limit to the maximum number of child processes. There are a number of ways to handle getting the retrieved data back to the parent or other process that don't require use of shared memory: use MLDBM::Sync use Cache::Cache (specifically Cache::FileCache) use database that runs as a daemon (e.g. MySQL or PostgresQL).	[reply]
Re: LWP::Parallel vs. HTTP::GHTTP vs. IO::Socket by hacker (Priest) on May 16, 2003 at 23:21 UTC
As it turns out, HTTP::MHTTP seems to have an 'issue' with name-based virtual hosts, exhibited by the code below, so I can't use that, and it doesn't appear to work on Windows machines either, which puts it in the non-portable category for me: `use strict; use HTTP::MHTTP; # This url REALLY exists, but is a virtual host # on a domain shared by multiple hosts. my $url = 'http://advogato.org/recentlog.html; http_init(); switch_debug(1); http_call("GET", $url); print http_response();` [download] Thanks to bart and ChemBoy for the enlightening discussion that exposed this issue. Based on my loose testing (excluding HTTP::MHTTP), it looks like HTTP::GHTTP is the fastest, followed closely by HTTP::Lite and LWP::Simple behind that. I haven't done benching against Parallel::ForkManager yet with these, so that waits to be seen. The other issue also, is the speed at which DNS queries are resolved. I think I can speed that up with a local database of resolved sites, but on the first run, that'll take a hit. Thanks for the tips and hints though, I'm closer to a functional solution, but it seems the more I test, the farther down the stack I get, closer to writing my own code around IO::Socket. I'd like to avoid that if I can.	[reply] [d/l]


laziness, impatience, and hubris
	PerlMonks