Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
I'm trying to fetch a web page, follow a javascript redirect, and verify that a string is found at the landing site.

If you copy/paste the url below into a browser +e=idealo-de&serviceType=portal&oid=1066de515358&sid=73&pt=idealo-de.e +xport.1-0& +2FWFS%2FBaur-BaurDe-Site%2Fde_DE%2F-%2FEUR%2FBV_ExternalCall-Start%3F +ArticleNo%3D515358%26NUMSArt%3D4443504%26NUMSArtPc%3D4488615%26Affili +ateID%3Dpangora-%2A%26Name%3Dpangora-produktdaten-baur%26ActionID%3Dp +reis-produkt-suche-baur%26WKZ%3D79%26IWL%3D101
you see it redirects to a web page whose page source should match the regex (eg, the price "249,90"). However, WWW::Mechanize cannot follow this redirect -- although in general, mech is usually pretty good about redirects. I think this is a server side redirect.

I realize I could attempt to parse the html on the obtained page, which has some text to the effect of "if you were not redirected, try here." However, I have many groups of such pages I would like to follow redirects with, each of which would require a parse (and some of which might not have a "if didn't redirect, try this" text to parse), and I'm wondering if there is a more general solution

I tried WWW::Selenium, but was unable to get it to work. I suspect this is because selenium is beta-ish, and maybe my environment is just selenium unfriendly. I'm on linux, suse, firefox 2.

Can anyone get this to work, with Mech, Selenium, or something else?

Thanks in advance!

use strict; use warnings; use WWW::Mechanize; my $url = ' +&serviceName=idealo-de&serviceType=portal&oid=1066de515358&sid=73&pt= +idealo-de.export.1-0& +P.enfinity%2FWFS%2FBaur-BaurDe-Site%2Fde_DE%2F-%2FEUR%2FBV_ExternalCa +ll-Start%3FArticleNo%3D515358%26NUMSArt%3D4443504%26NUMSArtPc%3D44886 +15%26AffiliateID%3Dpangora-%2A%26Name%3Dpangora-produktdaten-baur%26A +ctionID%3Dpreis-produkt-suche-baur%26WKZ%3D79%26IWL%3D101'; my $price = '249,9'; my $mech = WWW::Mechanize->new(); my $response = $mech->get( $url ); my $html = $mech->content; print "price: $price\n"; print "url: $url\n"; print "html: $html\n"; print "ok" if $html =~ $price;

UPDATE: changed "server side redirect" to "javascript redirect"

UPDATE 2 getting closer, but still can't do what I want:

use strict; use warnings; use WWW::Mechanize; use Data::Dumper; my $url = ' +&serviceName=idealo-de&serviceType=portal&oid=1066de515358&sid=73&pt= +idealo-de.export.1-0& +P.enfinity%2FWFS%2FBaur-BaurDe-Site%2Fde_DE%2F-%2FEUR%2FBV_ExternalCa +ll-Start%3FArticleNo%3D515358%26NUMSArt%3D4443504%26NUMSArtPc%3D44886 +15%26AffiliateID%3Dpangora-%2A%26Name%3Dpangora-produktdaten-baur%26A +ctionID%3Dpreis-produkt-suche-baur%26WKZ%3D79%26IWL%3D101'; my $price = '249,9'; print "price: $price\n"; my $redirect_url = redirect_url($url); my $redirect_url_expected = ' +ity/WFS/Baur-BaurDe-Site/de_DE/-/EUR/BV_ExternalCall-Start?ArticleNo= +515358&NUMSArt=4443504&NUMSArtPc=4488615&AffiliateID=pangora-bd&Name= +pangora-produktdaten-baur&ActionID=preis-produkt-suche-baur&WKZ=79&IW +L=101'; die "oops" unless $redirect_url eq $redirect_url_expected; my $mech = WWW::Mechanize->new(); $mech->agent('Firefox'); $mech->get( $redirect_url ); my $html = $mech->content; print "html from $redirect_url doesn't match $price\n" unless $html =~ + /$price/ ; print "but paste into browser and view source, and it does\n"; print "final url after firefox redirect (but not www::mech redirect) i +s something like " . ' +S/Baur-BaurDe-Site/de_DE/-/EUR/BV_DisplayProductInformation-ArticleNo +;sid=7oVhaTsE5oZsaX6rnON4q25Uv6S6Ixu_PzIwW50ajEGxS04TwoV1a_bGFYiItw== +?ArticleNo=515358&ls=0&firstPage=true&showGewinnspiel=true&showW3B=fa +lse' . "\n"; # uncomment this to print html, which is totally different from what y +ou get from firefox, show source. # print "html: $html"; # works ok sub redirect_url { my $url = shift or die "no url"; my $mech = WWW::Mechanize->new(); $mech->get( $url ); my $links; $links = $mech->links; $mech->get( $links->[1]->url ); $links = $mech->links; my $redirect_url = $links->[0]->base->as_string; }

In reply to WWW::Mechanize or WWW::Selenium with javascript redirect by tphyahoo

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?

What's my password?
Create A New User
Domain Nodelet?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (2)
As of 2023-04-01 14:43 GMT
Find Nodes?
    Voting Booth?

    No recent polls found