Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Multiple Step Login in Perl

by sylph001 (Sexton)
on May 20, 2014 at 16:18 UTC ( [id://1086815]=perlquestion: print w/replies, xml ) Need Help??

sylph001 has asked for the wisdom of the Perl Monks concerning the following question:

Respective Monks,

I'm trying to login to a vendor web site in order to fetch some data, but their asp+ajax+jquery web site always prevent me from getting in, either via LWP or curl.
Here's how the web site is like:

1. The login page (www.xxxx.gov/pages/profile.aspx) expects me to enter my email address (which was used during registering). After hit 'submit' button, I should be brought to the 'login question verify' page.
2. On the verify page, I was expected to fill in an answer to the account verify question.
Then by hitting 'verify' button, I should get login and see the my account profile, while the cookie is also got.

Using firebug, I noticed every time I hit 'submit' or 'verify' button, it sent a 'GetUpdatedFormDigest' request in SOAP body, and then did the normal POST with the '__REQUESTDIGEST=<DigestValueReturned>' in content.

So, I used LWP to imitate the SOAP request and got a response with a Digest Value which seems reasonable.
Then I did the normal $ua->post() with this Digest Value and other form parameters I got from the very first login page.
I thought I had posted all the form parameters I could see in firebug (and HttpWatch); but every time, I can only get a respond with exactly same html page as the very first login page - which, was requesting the login email address again.

Here I paste the request headers that I caught from a successful login in the browser, with firebug:
1. Header of the SOAP request from the browser:

(Request-Line)    POST http://www.xxxxxxxxx.gov/_vti_bin/sites.asmx HTTP/1.1
Accept    */*
Accept-Encoding    gzip, deflate
Accept-Language    en-US,en;q=0.8
Content-Length    332
Content-Type    text/xml
Host    www.xxxxxxxxx.gov
Origin    http://www.xxxxxxxxx.gov
Pragma    no-cache
Proxy-Connection    Keep-Alive
Referer    http://www.xxxxxxxxx.gov/pages/profile.aspx
SOAPAction    http://schemas.microsoft.com/sharepoint/soap/GetUpdatedFormDigest
User-Agent    Mozilla/5.0 (Windows NT 6.1; WOW64; chromeframe/27.0.1453.94) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.94 Safari/537.36

   What I wrote in perl to produce this SOAP request:
133     $res = $ua->request(
134         POST "http://www.xxxxxxxxx.gov/_vti_bin/sites.asmx",
135         Content_Type => 'text/xml',
136         Referer      =>  'http://www.xxxxxxxxx.gov/pages/profile.aspx',
137         X-Requested-With => 'XMLHttpRequest',
138         Proxy-Connection =>  'Keep-Alive',
139         User-Agent   =>  'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0',
140         Content      => ${SOAP_BODY}
141     );

2. Header of the normal POST request from browser, which should be responded with the next html page:

(Request-Line)    POST http://www.xxxxxxxxx.gov/pages/profile.aspx HTTP/1.1
Accept    text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding    gzip, deflate
Accept-Language    en-US,en;q=0.8
Content-Length    4570
Content-Type    application/x-www-form-urlencoded
Host    www.xxxxxxxxx.gov
Origin    http://www.xxxxxxxxx.gov
Pragma    no-cache
Proxy-Connection    Keep-Alive
Referer    http://www.xxxxxxxxx.gov/pages/profile.aspx
User-Agent    Mozilla/5.0 (Windows NT 6.1; WOW64; chromeframe/27.0.1453.94) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.94 Safari/537.36

   Again, what I wrote in my program to imitate this header:
148     my $header = HTTP::Headers->new(
149         'Accept'    =>  'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
150         'Accept-Encoding'=>'gzip, deflate',
151         'Proxy-Connection'=>  'Keep-Alive',
152         'Host'      =>  'www.xxxxxxxxx.gov',
153         'Referer'   =>  'http://www.xxxxxxxxx.gov/pages/profile.aspx',
154         'User-Agent'=>  'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0',
155         'Content-Type'=>'application/x-www-form-urlencoded',
156         'Origin'    =>  'http://www.xxxxxxxxx.gov',
157         'Accept-Language'=>'en-US,en;q=0.8',
158         'Pragma'    =>  'no-cache'
159         
160     );  
161     $ua->default_headers($header);


Dear Monks, could you let me know what I have missed in my praying that caused the vendor website shutting me out?

Thank you

Replies are listed 'Best First'.
Re: Multiple Step Login in Perl
by Corion (Patriarch) on May 20, 2014 at 16:23 UTC

    In your first request, you're not sending the SOAPAction header. Also, most likely, if you need to speak SOAP, something like SOAP::Lite is better than constructing your SOAP requests manually.

    In your second request, you are doing a very weird think when setting the headers via the ->default_headers method. Don't do that, but construct your request properly through HTTP::Request::Common instead.

    I recommend doing a proper compare between what your browser sends and what your script sends, by using Wireshark and/or LWP::Debug.

      Hi Corion,

      It seems I could login now by changing the way it does the POST, just like you said!

      Following the second point you mentioned, I changed to use a formal HTTP::Request::Common method to do the post with all the form parameters I could get, and it eventually become able to get the next web page!

      The SOAP::Lite does look more professional for dealing with SOAP requests. It is appearing to me, but I guess I need bit more time to learn how to put into use.

      Thank you very much for inspecting on each of the suspicious points from my program!


      Have a nice day
Re: Multiple Step Login in Perl
by sundialsvc4 (Abbot) on May 20, 2014 at 16:53 UTC

    Also, in the spirit of “most common oops,” be certain that your user-agent does have a cookie-jar, that it does follow redirects, and that it does capture and properly return all GET-variables.   I suggest that you perform a manual login while using a browser-debugger such as Firebug, and note exactly where and how authentication tokens are being exchanged.   Sometimes they are “in-band” to the SOAP, and sometimes, “out-of-band.”   It can be quite difficult to predict what a particular designer might have done.   Once you know, similarly inspect your code to be certain that it does exactly the same, for both error and non-error cases, and that it knows when to give-up.

    I find it useful to create a “mock host” in Perl for testing purposes.   Once you know exactly what the real host does, you build a small Perl script that you can connect-to over an appropriate localhost port, and bounce your client against that.   The mock-host will never lock you out.   (Unless, of course, you need to mock that behavior too, for testing purposes.)

    Finally, note that tools like Wireshark cannot help you with encrypted (https ...) content, whereas a browser debugger can.

      Can you explain a bit about what does the 'in-band' and 'out-of-bank' refer to?
      From my view of the web page, the SOAP response only contains the '__REQUESTDIGEST' value, which shows in the POST content I caught using firebug.
      Also, the other form parameters seem not changed, by comparing what was extracted from page source, with those caught from the POST content in firebug.

      Regards
        sylph001,
        I have no idea what the original author intended by in and out of band but here are some interesting things I have ran into while doing website automation:
        • Cookies being set with Javascript
        • Authentication using an Ajax JSON post not associated with the "login" button
        • Redirecting to a 1 use URL which authenticated and then redirected to normal page
        I am sure there are others but the point was that not all authentication methods are straight forward.

        Cheers - L~R

Re: Multiple Step Login in Perl
by boftx (Deacon) on May 20, 2014 at 22:35 UTC

    Anyone want to bet this is going to healthcare.gov? (No, I've never been to the site so I really don't know if it uses this "technology".)

    It helps to remember that the primary goal is to drain the swamp even when you are hip-deep in alligators.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1086815]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others contemplating the Monastery: (5)
As of 2024-04-23 09:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found