Michalis has asked for the wisdom of the Perl Monks concerning the following question:
Hello monks.
A simple question here.
I want to create "snapshots" from a couple of sites that have extensive cgis (the static files are less than 10%), and also have a database backend, in order to put them (the snapshots I mean) on a demonstration CD.
Does anyone know of ANY script/programm whatever out there that would allow me to follow all the links in the pages and save them as HTML files? (obviously, on forms I want to be asked for input) or should I better start hacking on one?
Thanks in advance for any advice
Regards
Michalis
Re: Dumping dynamic CGIs
by hossman (Prior) on Mar 09, 2002 at 00:08 UTC
|
If you just want to crawl the links on a site
and dump the pages to disk, wget is the way to go.
(regardless of if thelinks are dynamic or static)
The form issue is going to get you into trouble.
I doubt you'll find any general purpose tools for
doing what you want, because there are so many variables:
should it enter 1 set of values in every form, what are
those values? is there javascript that mutates the form
input prior to submission?
The good news is, using LWP, HTML::Tree and HTML::Element It's REALLY easy to:
- Download a page
- scrape all of the links from that page and remember them
- check for any forms on that page
- get a list of all the form elements in that form
- prompt the user how to fill our hte form and/or look up in some datastructure what to do with forms/elements that have those names.
| [reply] |
|
Thanks a lot
I didn't know the existence of these modules, so you really saved me a lot of time and work.
If I come up with a general-use tool, I will post it in PerlMonks.
Once again thanks.
| [reply] |
Re: Dumping dynamic CGIs
by fuzzyping (Chaplain) on Mar 08, 2002 at 22:38 UTC
|
I don't know of any easy-peasy perl hacks to do this (ala wget), but I think that LWP::UserAgent definitely has the functionality you'll need to accomplish the task. Good Luck!
-fuzzyping | [reply] |
|