http://qs321.pair.com?node_id=595230

tphyahoo has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to fetch a web page, follow a javascript redirect, and verify that a string is found at the landing site.

If you copy/paste the url below into a browser

http://xml.pangora.com/scripts/Redirect.php?fid=45&mid=1066&serviceNam +e=idealo-de&serviceType=portal&oid=1066de515358&sid=73&pt=idealo-de.e +xport.1-0&url=http%3A%2F%2Fwww.baur.de%2Fis-bin%2FINTERSHOP.enfinity% +2FWFS%2FBaur-BaurDe-Site%2Fde_DE%2F-%2FEUR%2FBV_ExternalCall-Start%3F +ArticleNo%3D515358%26NUMSArt%3D4443504%26NUMSArtPc%3D4488615%26Affili +ateID%3Dpangora-%2A%26Name%3Dpangora-produktdaten-baur%26ActionID%3Dp +reis-produkt-suche-baur%26WKZ%3D79%26IWL%3D101
you see it redirects to a web page whose page source should match the regex (eg, the price "249,90"). However, WWW::Mechanize cannot follow this redirect -- although in general, mech is usually pretty good about redirects. I think this is a server side redirect.

I realize I could attempt to parse the html on the obtained page, which has some text to the effect of "if you were not redirected, try here." However, I have many groups of such pages I would like to follow redirects with, each of which would require a parse (and some of which might not have a "if didn't redirect, try this" text to parse), and I'm wondering if there is a more general solution

I tried WWW::Selenium, but was unable to get it to work. I suspect this is because selenium is beta-ish, and maybe my environment is just selenium unfriendly. I'm on linux, suse, firefox 2.

Can anyone get this to work, with Mech, Selenium, or something else?

Thanks in advance!

use strict; use warnings; use WWW::Mechanize; my $url = 'http://xml.pangora.com/scripts/Redirect.php?fid=45&mid=1066 +&serviceName=idealo-de&serviceType=portal&oid=1066de515358&sid=73&pt= +idealo-de.export.1-0&url=http%3A%2F%2Fwww.baur.de%2Fis-bin%2FINTERSHO +P.enfinity%2FWFS%2FBaur-BaurDe-Site%2Fde_DE%2F-%2FEUR%2FBV_ExternalCa +ll-Start%3FArticleNo%3D515358%26NUMSArt%3D4443504%26NUMSArtPc%3D44886 +15%26AffiliateID%3Dpangora-%2A%26Name%3Dpangora-produktdaten-baur%26A +ctionID%3Dpreis-produkt-suche-baur%26WKZ%3D79%26IWL%3D101'; my $price = '249,9'; my $mech = WWW::Mechanize->new(); my $response = $mech->get( $url ); my $html = $mech->content; print "price: $price\n"; print "url: $url\n"; print "html: $html\n"; print "ok" if $html =~ $price;

UPDATE: changed "server side redirect" to "javascript redirect"

UPDATE 2 getting closer, but still can't do what I want:

use strict; use warnings; use WWW::Mechanize; use Data::Dumper; my $url = 'http://xml.pangora.com/scripts/Redirect.php?fid=45&mid=1066 +&serviceName=idealo-de&serviceType=portal&oid=1066de515358&sid=73&pt= +idealo-de.export.1-0&url=http%3A%2F%2Fwww.baur.de%2Fis-bin%2FINTERSHO +P.enfinity%2FWFS%2FBaur-BaurDe-Site%2Fde_DE%2F-%2FEUR%2FBV_ExternalCa +ll-Start%3FArticleNo%3D515358%26NUMSArt%3D4443504%26NUMSArtPc%3D44886 +15%26AffiliateID%3Dpangora-%2A%26Name%3Dpangora-produktdaten-baur%26A +ctionID%3Dpreis-produkt-suche-baur%26WKZ%3D79%26IWL%3D101'; my $price = '249,9'; print "price: $price\n"; my $redirect_url = redirect_url($url); my $redirect_url_expected = 'http://www.baur.de/is-bin/INTERSHOP.enfin +ity/WFS/Baur-BaurDe-Site/de_DE/-/EUR/BV_ExternalCall-Start?ArticleNo= +515358&NUMSArt=4443504&NUMSArtPc=4488615&AffiliateID=pangora-bd&Name= +pangora-produktdaten-baur&ActionID=preis-produkt-suche-baur&WKZ=79&IW +L=101'; die "oops" unless $redirect_url eq $redirect_url_expected; my $mech = WWW::Mechanize->new(); $mech->agent('Firefox'); $mech->get( $redirect_url ); my $html = $mech->content; print "html from $redirect_url doesn't match $price\n" unless $html =~ + /$price/ ; print "but paste into browser and view source, and it does\n"; print "final url after firefox redirect (but not www::mech redirect) i +s something like " . 'http://www.baur.de/is-bin/INTERSHOP.enfinity/WF +S/Baur-BaurDe-Site/de_DE/-/EUR/BV_DisplayProductInformation-ArticleNo +;sid=7oVhaTsE5oZsaX6rnON4q25Uv6S6Ixu_PzIwW50ajEGxS04TwoV1a_bGFYiItw== +?ArticleNo=515358&ls=0&firstPage=true&showGewinnspiel=true&showW3B=fa +lse' . "\n"; # uncomment this to print html, which is totally different from what y +ou get from firefox, show source. # print "html: $html"; # works ok sub redirect_url { my $url = shift or die "no url"; my $mech = WWW::Mechanize->new(); $mech->get( $url ); my $links; $links = $mech->links; $mech->get( $links->[1]->url ); $links = $mech->links; my $redirect_url = $links->[0]->base->as_string; }