Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Dumping dynamic CGIs

by Michalis (Pilgrim)
on Mar 08, 2002 at 22:17 UTC ( [id://150437]=perlquestion: print w/replies, xml ) Need Help??

Michalis has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks.
A simple question here. I want to create "snapshots" from a couple of sites that have extensive cgis (the static files are less than 10%), and also have a database backend, in order to put them (the snapshots I mean) on a demonstration CD.
Does anyone know of ANY script/programm whatever out there that would allow me to follow all the links in the pages and save them as HTML files? (obviously, on forms I want to be asked for input) or should I better start hacking on one?

Thanks in advance for any advice
Regards
Michalis

Replies are listed 'Best First'.
Re: Dumping dynamic CGIs
by hossman (Prior) on Mar 09, 2002 at 00:08 UTC
    If you just want to crawl the links on a site and dump the pages to disk, wget is the way to go. (regardless of if thelinks are dynamic or static)

    The form issue is going to get you into trouble. I doubt you'll find any general purpose tools for doing what you want, because there are so many variables: should it enter 1 set of values in every form, what are those values? is there javascript that mutates the form input prior to submission?

    The good news is, using LWP, HTML::Tree and HTML::Element It's REALLY easy to:

    1. Download a page
    2. scrape all of the links from that page and remember them
    3. check for any forms on that page
    4. get a list of all the form elements in that form
    5. prompt the user how to fill our hte form and/or look up in some datastructure what to do with forms/elements that have those names.
      Thanks a lot
      I didn't know the existence of these modules, so you really saved me a lot of time and work.
      If I come up with a general-use tool, I will post it in PerlMonks.
      Once again thanks.
Re: Dumping dynamic CGIs
by fuzzyping (Chaplain) on Mar 08, 2002 at 22:38 UTC
    I don't know of any easy-peasy perl hacks to do this (ala wget), but I think that LWP::UserAgent definitely has the functionality you'll need to accomplish the task. Good Luck!

    -fuzzyping

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://150437]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (3)
As of 2024-04-25 17:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found