Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^4: WWW::Mechanize and fooling server for javascript

by gw1500se (Beadle)
on Jul 31, 2007 at 18:23 UTC ( [id://629885]=note: print w/replies, xml ) Need Help??


in reply to Re^3: WWW::Mechanize and fooling server for javascript
in thread WWW::Mechanize and fooling server for javascript

What I meant was that I initially thought there was some kind of redirect that prevented the full page from loading if javascript was not enabled. Thus the server would serve a different page unless javascript was enabled. I was looking for a way to fool that mechanism into thinking javascript was enabled.

I have since been convinced this is not possible so my only alternative is to be able to parse the javascript for the assignment I am looking for (data='some hash string'). I am finding that the challenge is to find something that will let me access the javascript source. It seems that if the javascript is a link rather then embedded, LWP at least, will not "GET" it. I am hoping Mech will when I try it without the format option.
  • Comment on Re^4: WWW::Mechanize and fooling server for javascript

Replies are listed 'Best First'.
Re^5: WWW::Mechanize and fooling server for javascript
by Corion (Patriarch) on Jul 31, 2007 at 18:29 UTC

    It's quite easy to see what requests your browser makes, for example with the Live HTTP Headers Extension for FireFox. All you have to do then is to faithfully replicate the requests made by the browser with WWW::Mechanize. In one instance, I used HTTP::Request::FromTemplate to recreate HTTP requests from templates I created from sniffer logs. Other network analysis tools, like WireShark or Sniffer::HTTP could also be useful in determining the difference between what your browser sends and what your script sends.

      Thanks for all the replies and I guess the reason I am not making progress is because I am doing a poor job of explaining the problem. My apologies. I know what is happening so let me try to clarify.

      When I access the login page it includes a link to a javascript that simply does a "data=<some hash string>". The submit javascript uses that hash to encrypt the login password. Doing all that with perl is probably the easy part. The hard part is extracting that from the javascript source so I can get the hash string. With both Mech and LWP it seems the embedded javascript source is there but the linked is not.

      After more digging I THINK I know what has to be done so at this point I am asking for some reassurance. If I understand Mech, the first get retreives everything except links are not downloaded. Instead a list of links is built. Do I need to make another call to get the links I need? If so (this is more of a browser question I think) does the server know the subsiquent requests are from the same "session"? I use that word for lack of any better way to make the point as there real is no session context here. What concerns me is that if the javascript source does not come with the initial page, it will issue a different hash string that, when used, will not result in the correct encryption. Does a browser request links individually like Mech seems to? Perhaps the string is just a time based thing.

      Thanks again.

        This is why I mentioned the various methods of tracing what your browser sends over the wire. If you don't know how HTTP, HTML and browsers interact, watching the whole thing in action (or in slow motion replay via the logs) can be quite educating.

        Yes,you will need to request all separately linked things separately. As you will be replicating the stuff a browser sends, the server has no method of discerning between your script and a browser. How the serser stitches together all the separate requests into a whole session is up to the server. Cookies are a common method, but you will likely find when you look at the traffic that much of the traffic is static anyway. So, go look at what goes over the wire.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://629885]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (4)
As of 2024-04-25 12:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found