Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^3: save a page as text

by johnnywang (Priest)
on Apr 22, 2005 at 01:33 UTC ( [id://450254]=note: print w/replies, xml ) Need Help??


in reply to Re^2: save a page as text
in thread save a page as text

Javascript has been a problem for page scraping. People have tried to go around it by, say, recording the actual http parameters, which is not relevant to your problem. The other approach is to drive IE using Win32::OLE. I used Win32::IE::Mechanize before, but it's mainly for navigation/parsing, you/someone needs to figure out how to call the "Save As" method from COM.

I didn't know "Save As Text" will evaluate javascript printing. I tried it out, apparently it works.

Updated. just saw the module Win32::CaptureIE, it looks more promising.

Replies are listed 'Best First'.
Re^4: save a page as text
by dorward (Curate) on Apr 22, 2005 at 08:45 UTC
    Being a *nix user, anything that drives IE wouldn't be very useful to me, so my approach would be to examine the JavaScript and port it to Perl.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://450254]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (8)
As of 2024-03-28 17:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found