Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Checking links with LWP::UserAgent

by wilstephens (Acolyte)
on Feb 26, 2002 at 13:12 UTC ( #147544=perlquestion: print w/replies, xml ) Need Help??

wilstephens has asked for the wisdom of the Perl Monks concerning the following question:

Hi

I've written the below sub to check all URLs in my database and return a error message for each individual URL if the status code returned is not 200.

The sub works, but is slow. I've got 20 entries in my database at the moment, and the following sub takes just over 11 seconds to check all 20 entires.

Does anyone know of a (much) faster way of doing this? Is LWP::UserAgent the way to go, or is the way I'm accessing my MySQL slow?

Would selecting url from database be a lot faster than selecting * (all fields)? Should I get all results first into an array or hash and then output them instead of outputting one by one?

Thanks for any help you can give me!
use LWP::UserAgent; $ua = new LWP::UserAgent; $ua->agent("OpticDB LinkCheck/0.1"); &connect_to_db; my $clock_start = time; # start timer $sth = $dbh->prepare("SELECT * FROM $DB_MYSQL_NAME"); $sth->execute (); my $count = 0; while (my $ref = $sth->fetchrow_hashref ()) { my $req = new HTTP::Request GET => $ref->{'url_en'}; my $res = $ua->request($req); $res_id = $ref->{id}; $res_code = $res->code; $res_msg = $res->message; unless ($res_code eq "200") { $count ++; $tmpl_show_record .= qq| .. html to show erroneous records goes here ... |; } } $num_dead = $count; if ($count == 0) { &error_html("No dead links found!"); exit; } $sth->finish(); my $clock_finish = time - $clock_start; $time_taken = sprintf ("%.2f", $clock_finish); $dbh->disconnect;
--
Wiliam Stephens <wil@stephens.org>

Replies are listed 'Best First'.
Re: Checking links with LWP::UserAgent
by lhoward (Vicar) on Feb 26, 2002 at 13:24 UTC
    There's nothing wrong with your code per-se. Its just that it is checking each site sequentially; which requires a DNS lookup for each site, connect to each site, etc... What you need to do to improve performance is to be able to check multiple sites in parallel. Fortunately, there is a module that extends LWP for just this type of situation: LWP::Parallel.
      Thanks! I'll look into this!

      --
      Wiliam Stephens <wil@stephens.org>

      I was going to offer the same advice, but it looks like you beat me to it. LWP::Parallel is just the ticket for getting a lot of links in short order.

      "All you need is ignorance and confidence; then success is sure." -- Mark Twain
      OK. Does anyone know how I could go about converting the above code to use LWP::Parallel then? I'm afraid that I will run it and it will fire off and look through the 300+ sites all in parallel at the same time? Is there a way to limit it? -- Wiliam Stephens <wil@stephens.org>
        The docs have some good examples. You set the option:
        $ua->max_req(5);
        to set how many requests it will handle in parallel.
•Re: Checking links with LWP::UserAgent
by merlyn (Sage) on Feb 26, 2002 at 15:02 UTC
      Thanks for the link & advice, Randal. I've got a database of around 200 URLs, say I checked 10 at a time using LWP::parallel it shouldn't hurt the server too much, shoult it?

      It's a preety high-spec dedicated machine. What would anyone say would be the limit to these parallel connections?

      --
      Wiliam Stephens <wil@stephens.org>
(jeffa) Re: Checking links with LWP::UserAgent
by jeffa (Bishop) on Feb 26, 2002 at 17:39 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://147544]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (5)
As of 2020-12-02 07:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    How often do you use taint mode?





    Results (35 votes). Check out past polls.

    Notices?