Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Retrieve a CGI page

by dani++ (Sexton)
on Aug 31, 2001 at 14:58 UTC ( [id://109349]=note: print w/replies, xml ) Need Help??


in reply to Retrieve a CGI page

I've written a fairly sophisticated html spider using perl, lynx and tcsh (as glue), all the pages accessed are CGIs and it works as advertised. Have you tried to use 'lynx --source' or 'lynx --dump' as suggested?

I've refrained from using LWP as the target CGI system required cookies, sessions and full browser support.

Moreover, Lynx has a limited script option '-cmd_script=<script file>' that you can use to program what it does (download files, etc). Use '-cmd_log=<script log file>' to learn the syntax of the script files.

In my case I first download the pages, use perl to parse and analyse them, build a custom lynx script to retrieve exactly the data I want and run lynx again with the generated script.

dani++

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://109349]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (5)
As of 2024-04-18 18:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found