Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^2: save a page as text

by Anonymous Monk
on Apr 22, 2005 at 00:56 UTC ( [id://450247]=note: print w/replies, xml ) Need Help??


in reply to Re: save a page as text
in thread save a page as text

I can't just strip the HTML, if I could I know how to do that myself. There is JavaScript in the code that prints something out and I need to retrieve what this is.

I can't retrieve the source code becuse it's just the JS code there, not the data it prints. So I need a way to make a perl screen scraper to scrape text from a page without introducing HTML codes to any degree.

Replies are listed 'Best First'.
Re^3: save a page as text
by johnnywang (Priest) on Apr 22, 2005 at 01:33 UTC
    Javascript has been a problem for page scraping. People have tried to go around it by, say, recording the actual http parameters, which is not relevant to your problem. The other approach is to drive IE using Win32::OLE. I used Win32::IE::Mechanize before, but it's mainly for navigation/parsing, you/someone needs to figure out how to call the "Save As" method from COM.

    I didn't know "Save As Text" will evaluate javascript printing. I tried it out, apparently it works.

    Updated. just saw the module Win32::CaptureIE, it looks more promising.

      Being a *nix user, anything that drives IE wouldn't be very useful to me, so my approach would be to examine the JavaScript and port it to Perl.
Re^3: save a page as text
by Hero Zzyzzx (Curate) on Apr 22, 2005 at 01:06 UTC

    Sorry, I was a bit confused by the question. I do very little on or for windows, so hopefully someone more experienced with automating the evil empire will speak up.

    -Any sufficiently advanced technology is
    indistinguishable from doubletalk.

    My Biz

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://450247]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (3)
As of 2024-04-19 23:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found