http://qs321.pair.com?node_id=277381

c has asked for the wisdom of the Perl Monks concerning the following question:

I'm starting a small project wherein a script will use LWP to get a copy of a remote website. I'd like to find the amount of time that it takes for the script to pull the full index page of the URL I am handing to it, so I can log the number and see how it varies throughout the day. I question whether or not my code is valid for finding the true elapse time. This code is just for testing and doesnt take into consideration, http errors:

#!/usr/bin/perl -w + use strict; + use LWP::UserAgent; use Sys::Hostname; my $agent = new LWP::UserAgent( timeout => 30 ); my $site = 'http://www.perlmonks.org/'; my $response = new HTTP::Request GET => $site; my $start = time(); my $page = $agent->request($response); my $end = time(); my $sum = $end - $start; print "\n$end - $start = $sum";

This doesnt break down into ms, however I'm not that concerned about precise timing for the moment. I would however, like to be able to confirm when the entire page is downloaded and then compare start and finish times. Are there any LWP builtins that provide elapsed time? Am I really just getting the response time for the initial GET of the index page but none of the images held within it?

Thank you for your input -c

Replies are listed 'Best First'.
Re: Transaction time for LWP::UserAgent GET
by waswas-fng (Curate) on Jul 24, 2003 at 01:16 UTC
    Assuming that there is no caching proxy between you and the site, yes this will give you an overview of the time it takes to download the index. however you may find it a better overall test to download the index and all referanced img/js/etc to get a better feel for the end user's experiance. Sometimes a pesky slow ad ser ver referanced img will slow down the get =).

    -Waswas
Re: Transaction time for LWP::UserAgent GET
by DrManhattan (Chaplain) on Jul 24, 2003 at 01:37 UTC
    If the page isn't too slow, you'll probably find that all your $sum values are either 0 or 1. If you want a more accurate measurement, check out Time::HiRes .

    -Matt

Re: Transaction time for LWP::UserAgent GET
by sauoq (Abbot) on Jul 24, 2003 at 02:20 UTC

    Your question has been answered but I have a couple comments about style. First, "indirect object" syntax for method calls is best avoided. Consider writing these alternatives for your current syntax:

    my $agent = LWP::UserAgent->new( timeout => 30 ); # ... my $response = HTTP::Request->new( GET => $site );
    Second, it doesn't really make sense to call your new HTTP::Request object $response. It is, afterall, a request object and not a response object. Why not call it $request? I have a feeling that you did it just so my $page = $agent->request($response) would read like English but I don't think that's worth the potential confusion. It might read like English, but it doesn't read like code. It's even more confusing once you realize that the request() method returns an HTTP::Response object. I'd write that like this:
    my $get = HTTP::Request->new( GET => $site ); # ... my $response = $agent->request( $get );

    -sauoq
    "My two cents aren't worth a dime.";
    

      First, "indirect object" syntax for method calls is best avoided.

      Really? If you're not doing anything too complicated, then what's wrong with the indirect object syntax, and why should it be avoided? It works, and personally, I think it looks much, much better. When used for constructors, the syntax almost reads like English:

      my $ua = new LWP::UserAgent;

      "Create a new LWP User Agent."

        If you're not doing anything to [sic] complicated, then what's wrong with the indirect object syntax, and why should it be avoided?

        The biggest problem with indirect object notation is that perl has to jump through hoops to decide whether method Class; should be interpreted as Class->method() or method("Class") and small changes to the order of those hoops can make perl change its mind. That can result in the appearance of nasty hard-to-find bugs in previously working code after a small seemingly innocuous change.

        When used in instance method calls, the "object" suffers from the same brittle parsing that the filehandle argument to print() does. That is, however, a relatively uncommon practice in comparison to the habit of using it for class methods, particularly constructors.

        You are right that problems rarely arise in simple use. But, they do exist whether or not you are doing something complicated. That means you could be bitten when your simple code goes through several iterations and becomes more complicated or if you continue to use the syntax out of habit even when you take on a more complex project.

        Look at it this way... If you get tripped up by a subtle bug caused by the finickiness of indirect object syntax, you'll likely spend an hour or more pulling out your hair trying to debug a problem that you are sure you shouldn't be seeing... And when it's all over and you've determined the cause, you'll swear an oath never to use indirect object notation again anyway.

        Or, you might never run into the problem in which case you'll keep your hair as well as the bad habit and eventually you'll be hired as a PHB and stop coding altogether. But then you'll be forever furtively glancing over your shoulder in fear of the inevitable attack by a bald maintenance programmer with a strong homicidal urge brought on by debugging your code...

        Why not save yourself the trouble by breaking the bad habit now? ;-)

        -sauoq
        "My two cents aren't worth a dime.";
        
Re: Transaction time for LWP::UserAgent GET
by BrowserUk (Patriarch) on Jul 24, 2003 at 03:25 UTC

    There doesn't appear to be any handy hooks/settings in LWP::UserAgent for doing this kind of timing.

    Tracking through LWP::UserAgent, there is a fair amount of housekeeping and error checking that has to be done before and after the actual request is sent and the respose received. If your intent is to measure the user experience, then their browser is probably having to do a similar amount of work and so you should count this. However, if your trying to measure network or server response times, you might consider putting your timing points a bit closer to the IO itself.

    Before and after the last if else block in LWP::UserAgent, send_request() sub might be a good place or even in the request() sub in LWP::Protocol::HTTP

    For doing your timing, I'd recommend Benchmark::Timer. It allows you to have multiple concurrent and overlapping timers, each with its own name. It takes care of storing and averaging the times but still allows you to access the raw data if you need to.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller

Re: Transaction time for LWP::UserAgent GET
by aquarium (Curate) on Jul 24, 2003 at 03:41 UTC
    you might consider using a module or executable inside your code instead, to retrieve page and content graphics etc (specify one level deep). I believe such module/executables are plenty abound. This will retrieve into a directory/files and will not have nasty side effects if the page is huge. You'll be measuring more accurately a browser experience. Your mileage may vary when particular sites require javascript/flash/frames etc, as your code would only receive the page for a non-javascript/non-flash/no-frames browser request.
Re: Transaction time for LWP::UserAgent GET
by Anonymous Monk on Jul 24, 2003 at 15:57 UTC
    You will need to use Time::HiRes to get a fine enough time granularity to do anything useful.

    I while back a wrote a small script to monitor my servers for performance changes (so that I had ammo when complaining to my DSL provider). I found that the size of the page you are downloading effect the time significantly since there are fixed costs to the socket connection. Therefore I used a lower level module Net::HTTP so that I could specify a fixed download size. Here is the code snippet...
    use Time::HiRes qw(gettimeofday); : : my $prep_start = gettimeofday ; $conn = new Net::HTTP( 'Host'=>"$url") || goto END_LOOP; my $prep_end = gettimeofday; # - - - - - Start - - - - - - my $start = gettimeofday ; $rtn = $conn->write_request(GET => "/", 'User-Agent' => "perlworks/1. +0") || goto END_LOOP;; my ($code, $mess, %h) = $conn->read_response_headers(); $data{'page_size'} = $conn->read_entity_body($page, 512); my $end = gettimeofday; # - - - - - Stop - - - - - - : : my $delta = $end - $start;
Re: Transaction time for LWP::UserAgent GET
by mp (Deacon) on Jul 24, 2003 at 18:18 UTC
    You can time the subroutine that does the request, in this case LWP::UserAgent::request (or LWP::UserAgent::simple_request, one level further down) using Hook::LexWrap and Time::HiRes. This can be handy when there are a number of layers between the subroutine/method you call and the one that you want to time (e.g. when you're using some higher level module like WWW::Mechanize to fetch the pages.

    Please note that no attempt was made to adjust the time reported to take into account the time spent in the timing code itself.

    use strict; use warnings; use Time::HiRes qw(tv_interval gettimeofday); use Hook::LexWrap; use LWP::UserAgent; use Data::Dumper; my $start; my @timer_list; my $timer = {}; wrap LWP::UserAgent::request, pre => \&start_timer, post => \&stop_timer, ; ### Now any LWP calls that use the subroutine ### LWP::UserAgent::request will be timed my $agent = LWP::UserAgent->new(timeout => 30); my $site = 'http://www.perlmonks.org/'; my $response = HTTP::Request->new(GET => $site); my $page = $agent->request($response); $page = $agent->request($response); $page = $agent->request($response); $page = $agent->request($response); $page = $agent->request($response); ### After doing some LWP work, dump the results: print Dumper(\@timer_list); sub start_timer { $start = [gettimeofday]; } sub stop_timer { $timer->{interval} = tv_interval ($start, [gettimeofday]); push @timer_list, $timer; $timer = {}; };
    And the output:
    $VAR1 = [ { 'interval' => '8.521706' }, { 'interval' => '3.794971' }, { 'interval' => '20.934479' }, { 'interval' => '12.792284' }, { 'interval' => '4.147622' } ];
Re: Transaction time for LWP::UserAgent GET
by liz (Monsignor) on Jul 24, 2003 at 21:28 UTC
    Many monks have already answered, but only on a technical level. And in that respect I have nothing to add.

    However, I can't help but wonder what the timing information will tell you. Either you need a copy of a page at a specific time, or you don't. Unless you want to use the timing information to be able to determine how often you can fetch the remote page before you start overlapping requests.

    I don't know whether that is the case or not, but if it is, and if I were the hostmaster of this website from which you are repeatedly fetching pages without my explicit consent, I would consider your action to be akin to a DoS attack. Not to mention the possible copyright violations involved if you are redistributing the obtained information further.

    In your example you're using the Perl Monks website. I'm only a user for a few weeks so far, but I already find Perl Monks sluggish at times. And I wouldn't be surprised if it were "experiments", such as yours, are at least partly responsible for this.

    I may seem a bit agitated about this, but I just have had to block too many stupid slurpers in my life.

    Liz

      Liz, The example in my post, was just that, an example and certainly not meant to be taken as production code. The URL provided in the message was a placeholder in order to not reveal the actual domain that I'll be testing against.

      The purpose of the script is to pull the same website over the course of a day from various locations across several NSP backbones to graph and compare latency at various points of the day.

      I understand your frustration with sluggish websites and having to wait for information on the internet, but I don't appreciate the character judgement you're making without any investigation. If you have a suggestion or a comment to make, please take it offline and keep this off of a forum like this. The only reason I post this publicly is to offer a bit of personal defense.

        ...The purpose of the script is to pull the same website over the course of a day from various locations across several NSP backbones to graph and compare latency at various points of the day...

        Wish you had said this in the first place.

        I'm glad you're not working on a "stupid slurper". ;-)

        And if I offended you, I hereby apologize. It's just that your initial description hit a nerve. I'm glad mod_perl 2 allows you to break a connection before it is actually completely set up.

        Liz