Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^5: WWW::Mechanize and fooling server for javascript

by Corion (Patriarch)
on Jul 31, 2007 at 18:29 UTC ( [id://629887]=note: print w/replies, xml ) Need Help??


in reply to Re^4: WWW::Mechanize and fooling server for javascript
in thread WWW::Mechanize and fooling server for javascript

It's quite easy to see what requests your browser makes, for example with the Live HTTP Headers Extension for FireFox. All you have to do then is to faithfully replicate the requests made by the browser with WWW::Mechanize. In one instance, I used HTTP::Request::FromTemplate to recreate HTTP requests from templates I created from sniffer logs. Other network analysis tools, like WireShark or Sniffer::HTTP could also be useful in determining the difference between what your browser sends and what your script sends.

  • Comment on Re^5: WWW::Mechanize and fooling server for javascript

Replies are listed 'Best First'.
Re^6: WWW::Mechanize and fooling server for javascript
by gw1500se (Beadle) on Jul 31, 2007 at 23:27 UTC
    Thanks for all the replies and I guess the reason I am not making progress is because I am doing a poor job of explaining the problem. My apologies. I know what is happening so let me try to clarify.

    When I access the login page it includes a link to a javascript that simply does a "data=<some hash string>". The submit javascript uses that hash to encrypt the login password. Doing all that with perl is probably the easy part. The hard part is extracting that from the javascript source so I can get the hash string. With both Mech and LWP it seems the embedded javascript source is there but the linked is not.

    After more digging I THINK I know what has to be done so at this point I am asking for some reassurance. If I understand Mech, the first get retreives everything except links are not downloaded. Instead a list of links is built. Do I need to make another call to get the links I need? If so (this is more of a browser question I think) does the server know the subsiquent requests are from the same "session"? I use that word for lack of any better way to make the point as there real is no session context here. What concerns me is that if the javascript source does not come with the initial page, it will issue a different hash string that, when used, will not result in the correct encryption. Does a browser request links individually like Mech seems to? Perhaps the string is just a time based thing.

    Thanks again.

      This is why I mentioned the various methods of tracing what your browser sends over the wire. If you don't know how HTTP, HTML and browsers interact, watching the whole thing in action (or in slow motion replay via the logs) can be quite educating.

      Yes,you will need to request all separately linked things separately. As you will be replicating the stuff a browser sends, the server has no method of discerning between your script and a browser. How the serser stitches together all the separate requests into a whole session is up to the server. Cookies are a common method, but you will likely find when you look at the traffic that much of the traffic is static anyway. So, go look at what goes over the wire.

        Thanks but again knowing what goes over the wire is not my problem. What I'm struggling with is on the perl side which is why I'm here. I KNOW what is coming over the wire, the problem is how to extract that with Mech or LWP.

        It appears to me that Mech ignores <script ... src="some link"> tags. I can see no way to get that source as that tag does not even show up with a "get".

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://629887]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2024-03-29 15:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found