WWW::Mechanize follow meta refreshes

My web app (for IIS reasons) uses meta refreshes to redirect the user around. I test these redirects with the code below. I've used a regex as my refresh template is fixed and very, very simple. However, if yours isn't/aren't then you should replace the regex with a call to something like HTML::TokeParser.

Update: fixed silly mistake as hilighted below. Also fixed what was, for my test suite, a logic error. The final get call must be to $expected_url and not $url. If your test quite works differently then use $url instead :).

sub meta_refresh
{
    my $mech = shift;
    my $expected_url = shift;
    
    my $url;
    
    if($mech->content() =~ /<meta http-equiv="refresh" content="0;url=
+([^"]*)"/)
    {
        $url = $1;
    }

    cmp_ok($expected_url, 'eq', $url, "The meta refresh returns the ex
+pected URL");
    
    $mech->get( $expected_url );
    ok($mech->success(), "URL loaded successfully");
}
[download]

So call it like this:

   # Code to cause the refresh to appear not shown.
   
   # Check the refresh and follow
   meta_refresh($mech, '/index.cgi?rm=home');
[download]

Comment on WWW::Mechanize follow meta refreshes Select or Download Code

Replies are listed 'Best First'.
Re: WWW::Mechanize follow meta refreshes by merlyn (Sage) on Apr 13, 2005 at 14:38 UTC
`$mech->content() =~ /<meta http-equiv="refresh" content="0;url=([^"]*) +"/; my $url = $1;` [download] Never never never use $1 without having tested the match. If the match fails, you're using a previous $1 from a previous successful match. Oops! -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply] [d/l]
Re^2: WWW::Mechanize follow meta refreshes by simon.proctor (Vicar) on Apr 13, 2005 at 15:42 UTC
Oops!! :). I have updated the node.	[reply]
Re: WWW::Mechanize follow meta refreshes by Kanji (Parson) on Apr 14, 2005 at 15:19 UTC
I've used a regex as my refresh template is fixed and very, very simple. However, if yours isn't/aren't then you should replace the regex with a call to something like HTML::TokeParser. This is actually built into `WWW::Mechanize` (well, LWP...) for you, so you can do something like:- `if ($mech->response and my $refresh = $mech->response->header('Refresh +')) { my($delay, $uri) = split /;url=/i, $refresh; $uri \|\|= $mech->uri; # No URL; reload current URL. sleep $delay; $mech->get($uri); }` [download] `$delay` should probably be validated to protect against malformed META refresh tags, and there's a whole other headache about potential loops if you hack WWW::Mechanize to follow refreshes automatically. --k.	[reply] [d/l]
Re^2: WWW::Mechanize follow meta refreshes by simon.proctor (Vicar) on Apr 15, 2005 at 09:00 UTC
The snippet I provided is from my test suite. I'll be first to admit that it isn't great as I've only just started hacking away with Mechanize (and wondered why I didn't start sooner ;P). Anyway, from a testing perspective is it not better to follow the expected url and not the url in the template? Its only a minor point but are you not then reporting on a mistaken redirect but continuing as normal otherwise? I feel this is better but would welcome your comments. I do like the delay bit but, for my testing purposes, I would also pass that into the function. Something like: `meta_refresh($mech, '/index.cgi?rm=home', 5);` [download] Or whatever :). I would also then, personally, have a default delay (of some time determined by the particular project) and simply validate the delay as being correct (for the same reasons as with the URL). Its funny, I only wrote this function because IIS, at the time, couldn't handle HTTP redirects and would crash (no really). Its fixed now but I don't have the time to rework my app again :).	[reply] [d/l]
Re: WWW::Mechanize follow meta refreshes by jbrugger (Parson) on Apr 14, 2005 at 06:10 UTC
As it does with JavaScript as well, see www::mechanize reloading page, so you'd remove a line (or a part of it) to stop JavsScript from loading another page. "We all agree on the necessity of compromise. We just can't agree on when it's necessary to compromise." - Larry Wall.	[reply]
Re: WWW::Mechanize follow meta refreshes by mhi (Friar) on Aug 28, 2005 at 20:46 UTC
Thanks for posting this. It was just right to get me on my way to becoming a first-time user of WWW::Mechanize. I've put together a little script that will do some pre-fetching on a web-application. The app has a result-cache and will output a refresh whenever it encounters a cache-miss and does its calculations. Instead of parsing the refresh-URLs or somesuch, I simply limited the number of refreshs the script will perform. Perhaps somemonk will find a useful snippet of code herein. #!/usr/bin/perl -w use strict; use WWW::Mechanize; my $maxrefreshs=5; my $debug=0; my @urls=( "http://www.whatever.xy/cgi-bin/cgi.pl?ACTION=surnames", "http://www.whatever.xy/cgi-bin/cgi.pl?ACTION=unisearch&MATCHSTRING +=foo", "http://www.whatever.xy/cgi-bin/cgi.pl?ACTION=path&STARTNODE=I1&END +NODE=I1257", ); my $refreshs=0; my $mech= new WWW::Mechanize; foreach my $url (@urls){ while($refreshs < $maxrefreshs){ $mech->get($url); my $c=$mech->content; $debug and print $c; if($c =~/<meta\s+http-equiv="refresh"\s+content="\d+;\surl=([^" +])"/mi){ $url=($1 or $url); ++$refreshs; }else{ last; } } } [download]	[reply] [d/l]


"be consistent"
	PerlMonks