Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

GET request using LWP::UserAgent returns 200 OK but Firefox 302 Found

by bliako (Monsignor)
on Mar 09, 2018 at 15:08 UTC ( [id://1210570]=perlquestion: print w/replies, xml ) Need Help??

bliako has asked for the wisdom of the Perl Monks concerning the following question:

Esteemed Archontes of Perl Wisdom,

I am sorry that this is a very vague question because I can not share the details of the request but also because I can test this only once a day.

In a particular scraping exercise I observe that my browser (Firefox 48 and firebug) sends a GET to the server and receives a "302 Found" server response. However, when I do the same programmatically with LWP I get a 200 OK server response back.

More details:

The whole exercice consists of: POST, GET, GET.

The POST returns a 302 and a "Location" header (RLOC1). This succeeds in LWP.

RLOC1 is relative URL in the form of "../../ABC/xyz.jsp" and so I make it absolute by inserting protocol and server name and replacing "../../" with absolute path. I assume absolute path is correct because I can verify with what I see in browser/firebug. And so I now have absolute url LOC1.

Next, I GET to LOC1 using LWP, expecting to get 302, like I get in my browser, but I get 200 OK. (and no "Location" in headers to continue with my next and final GET).

I am quite sure the request headers (including cookies, referer and user-agent string) are the same in both although I am now using LWP::ConsoleLogger::Easy in order to verify that exactly.

I am very sure that the GET params sent to the server with LWP and browser are identical (and there are no character escape issues).

Assuming request headers and GET params are identical between LWP and browser, are there other factors that can cause the remote server to behave differently when using LWP?

Any ideas?

  • Comment on GET request using LWP::UserAgent returns 200 OK but Firefox 302 Found

Replies are listed 'Best First'.
Re: GET request using LWP::UserAgent returns 200 OK but Firefox 302 Found
by 7stud (Deacon) on Mar 10, 2018 at 13:00 UTC

    I find Wireshark pretty hard to figure out, so here's a quick way to get started:

    1. Capture > Options > WiFi, and click on the Start button in the lower right corner. Or, on the Wireshark Welcome page just double click Wifi. Loopback is for listening on localhost.
    2. In your browser, navigate to some website, like google.com
    3. You will see a massive amount of data scroll by in the Wireshark window.

    To make sense of all the data:

    1. Make sure View > Colorized Packet List is checked.
    2. To display only the http lines, there is a text input right above the data window that says: Apply display filter. Type in: http. Then on the far right of the tex input click the blue arrow to apply the display filter. You can get as specific as you want with a display filter. There are also some default display filters that you can access from a drop down list by clicking the blue icon to the left of the text input.
    3. Then double click on one of the displayed http lines in the main window, and in the popup window start expanding the disclosure triangles.
    4. In the bottom pane where the hexdump is displayed, you can right click the pane and choose between hex and binary format. On the right hand side of the hexdump pane, you can see the text; periods represent non printing characters. If you hover over one of the periods, the hex/binary representation on the left side will be highlighted.
    5. To clear the display window, in the Wireshark toolbar click on the third icon from the left: a green shark fin.
      Then double click on one of the displayed http lines in the main window

      I also find it very useful to single-click on one of the lines, and then go to the menu Analyze -> Follow TCP Stream. (I even wrote a little tool to dump all streams from one or more .pcap files on the command line.)

Re: GET request using LWP::UserAgent returns 200 OK but Firefox 302 Found
by haukex (Archbishop) on Mar 10, 2018 at 11:23 UTC

    The debugging functions provided by browsers and LWP may not always show you everything, so my first step would be to inspect the actual packets going over the wire with Wireshark.

Re: GET request using LWP::UserAgent returns 200 OK but Firefox 302 Found
by rizzo (Curate) on Mar 10, 2018 at 18:06 UTC
    The whole exercice consists of: POST, GET, GET.

    Are you sure?

    Can it be, that you're actually doing a

    GET, POST, GET, GET

    in Firefox( e.g. visit the side with GET and then POST some form data)
    while you're doing a

    POST, GET, GET

    in LWP?

    If this is the case you are probably lacking some cookies in your LWP-version which makes it behave differently.

      Good thinking! Indeed there are other GETs before the ones I described (in a previous phase which completes successfully) so the cookie actually is set. thanks.

      I think I am getting closer though but I still have to test it:

      after using

      Wireshark as per 7stud's and haukex's advice

      and

      LWP::ConsoleLogger::Easy

      I realised that LWP::UserAgent could be responding to a '302 Found' automatically and follow the redirect. And that BOTH me (via LWP) and LWP (responding to the 302 automatically) are sending another request to the re-location (however, I am sending one after LWP finished with his). And that messes things up.

      Man page of LWP states that there is a list called 'requests_redirectable' which contains the protocols for which to follow redirects. By default, 'GET' and 'HEAD' are included. POST is not.

      Given also that LWP's 'max_redirect' is 7 by default, it sounds to me that a GET returning with a 302 will cause LWP to follow automatically. But I am also doing that myself in the program having assumed that LWP will not follow redirects (or forgottent that it does).

      In my 'scraping exercise' there is a long list of previous POSTs which return a 302 but this is the first time GET does. The POSTs were not followed on by LWP and all was OK but the GET is (because it is in the 'requests_redirectable' list of LWP) and the problem arose.

      thanks

        I can now say that the problem indeed is that LWP was following redirects (as it should). But also myself was also following redirects by issuing another request via LWP.

        So how I solved it was to set

        $ua->requests_redirectable([]);
        which tells LWP::UserAgent not to follow any redirects for any request.

        (setting

        $ua->requests_redirectable(['GET']);
        would allow only GET to be followed by LWP).

        I have also discovered that there is another problem with allowing UA to follow redirects. In a redirect the server sends a Location header which contains the url of the redirect and issues a 302 status (or 30X something). UA extracts this Location url and issues another request to there.

        The problem lies in the server sometimes sending a relative url back. And UA tries to make it absolute. In my case, UA failed to do that. So even if I allowed UA to follow redirect, it would have failed in sending a malformed url to the server.

        UA has the following code to convert the url:

        my $referral_uri = $response->header('Location'); { # Some servers erroneously return a relative URL for redir +ects, # so make it absolute if it not already is. local $URI::ABS_ALLOW_RELATIVE_SCHEME = 1; my $base = $response->base; $referral_uri = "" unless defined $referral_uri; $referral_uri = $HTTP::URI_CLASS->new($referral_uri, $base)->abs($ba +se); } $referral->uri($referral_uri);

        In my case:

        base='http://server.com/ABC/afilename1?op=678' referral='../../ABC/XYZ/KLM/afilename2?aa=123'
        and the calculated new referral came out as:
        http://server.com/../ABC/XYZ/KLM/afilename2?aa=123

        instead of the correct one of:

        http://server.com/ABC/XYZ/KLM/afilename2?aa=123

        may be this is expected behaviour from URI->abs()?< I will send a bug report just in case.

        Thanks Monks

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1210570]
Approved by talexb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (5)
As of 2024-04-16 12:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found