Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

WWW::Mechanize follow meta refreshes

by simon.proctor (Vicar)
on Apr 13, 2005 at 09:40 UTC ( [id://447314]=CUFP: print w/replies, xml ) Need Help??

My web app (for IIS reasons) uses meta refreshes to redirect the user around. I test these redirects with the code below. I've used a regex as my refresh template is fixed and very, very simple. However, if yours isn't/aren't then you should replace the regex with a call to something like HTML::TokeParser.

Update: fixed silly mistake as hilighted below. Also fixed what was, for my test suite, a logic error. The final get call must be to $expected_url and not $url. If your test quite works differently then use $url instead :).
sub meta_refresh { my $mech = shift; my $expected_url = shift; my $url; if($mech->content() =~ /<meta http-equiv="refresh" content="0;url= +([^"]*)"/) { $url = $1; } cmp_ok($expected_url, 'eq', $url, "The meta refresh returns the ex +pected URL"); $mech->get( $expected_url ); ok($mech->success(), "URL loaded successfully"); }
So call it like this:
# Code to cause the refresh to appear not shown. # Check the refresh and follow meta_refresh($mech, '/index.cgi?rm=home');

Replies are listed 'Best First'.
Re: WWW::Mechanize follow meta refreshes
by merlyn (Sage) on Apr 13, 2005 at 14:38 UTC
    $mech->content() =~ /<meta http-equiv="refresh" content="0;url=([^"]*) +"/; my $url = $1;
    Never never never use $1 without having tested the match. If the match fails, you're using a previous $1 from a previous successful match. Oops!

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      Oops!! :).

      I have updated the node.
Re: WWW::Mechanize follow meta refreshes
by Kanji (Parson) on Apr 14, 2005 at 15:19 UTC
    I've used a regex as my refresh template is fixed and very, very simple. However, if yours isn't/aren't then you should replace the regex with a call to something like HTML::TokeParser.

    This is actually built into WWW::Mechanize (well, LWP...) for you, so you can do something like:-

    if ($mech->response and my $refresh = $mech->response->header('Refresh +')) { my($delay, $uri) = split /;url=/i, $refresh; $uri ||= $mech->uri; # No URL; reload current URL. sleep $delay; $mech->get($uri); }

    $delay should probably be validated to protect against malformed META refresh tags, and there's a whole other headache about potential loops if you hack WWW::Mechanize to follow refreshes automatically.

        --k.


      The snippet I provided is from my test suite. I'll be first to admit that it isn't great as I've only just started hacking away with Mechanize (and wondered why I didn't start sooner ;P).

      Anyway, from a testing perspective is it not better to follow the expected url and not the url in the template? Its only a minor point but are you not then reporting on a mistaken redirect but continuing as normal otherwise? I feel this is better but would welcome your comments.

      I do like the delay bit but, for my testing purposes, I would also pass that into the function. Something like:
      meta_refresh($mech, '/index.cgi?rm=home', 5);
      Or whatever :). I would also then, personally, have a default delay (of some time determined by the particular project) and simply validate the delay as being correct (for the same reasons as with the URL).

      Its funny, I only wrote this function because IIS, at the time, couldn't handle HTTP redirects and would crash (no really). Its *fixed now* but I don't have the time to rework my app again :).
Re: WWW::Mechanize follow meta refreshes
by jbrugger (Parson) on Apr 14, 2005 at 06:10 UTC
    As it does with JavaScript as well, see www::mechanize reloading page, so you'd remove a line (or a part of it) to stop JavsScript from loading another page.
    "We all agree on the necessity of compromise. We just can't agree on when it's necessary to compromise." - Larry Wall.
Re: WWW::Mechanize follow meta refreshes
by mhi (Friar) on Aug 28, 2005 at 20:46 UTC
    Thanks for posting this. It was just right to get me on my way to becoming a first-time user of WWW::Mechanize.

    I've put together a little script that will do some pre-fetching on a web-application. The app has a result-cache and will output a refresh whenever it encounters a cache-miss and does its calculations.

    Instead of parsing the refresh-URLs or somesuch, I simply limited the number of refreshs the script will perform. Perhaps somemonk will find a useful snippet of code herein.

    #!/usr/bin/perl -w use strict; use WWW::Mechanize; my $maxrefreshs=5; my $debug=0; my @urls=( "http://www.whatever.xy/cgi-bin/cgi.pl?ACTION=surnames", "http://www.whatever.xy/cgi-bin/cgi.pl?ACTION=unisearch&MATCHSTRING +=foo", "http://www.whatever.xy/cgi-bin/cgi.pl?ACTION=path&STARTNODE=I1&END +NODE=I1257", ); my $refreshs=0; my $mech= new WWW::Mechanize; foreach my $url (@urls){ while($refreshs < $maxrefreshs){ $mech->get($url); my $c=$mech->content; $debug and print $c; if($c =~/<meta\s+http-equiv="refresh"\s+content="\d+;\s*url=([^" +]*)"/mi){ $url=($1 or $url); ++$refreshs; }else{ last; } } }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://447314]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (3)
As of 2024-04-24 23:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found