Foreign directory search with LWP part 2?

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Foreign directory search with LWP part 2? by larsen (Parson) on Apr 27, 2001 at 13:51 UTC
You could look at HTML::Parser LWP WWW::Robot WWW::SimpleRobot to write your own robot.	[reply]
Re: Foreign directory search with LWP part 2? by little (Curate) on Apr 27, 2001 at 13:25 UTC
they use a porgram called spider that looks up for a file called "robots.txt" in the root dir of the specified (registered) domain. So they try to get "http://www.perlmonks.org/robots.txt") Inside this file is specified what the spider may read and what directories he might search. But still not. Acces must be granted. You can read about this very good on the special section of these search engines, where is described how such a file must and can look like and how it works. CPAN has it Have a nice day All decision is left to your taste	[reply]
Re: Foreign directory search with LWP part 2? by suaveant (Parson) on Apr 27, 2001 at 16:37 UTC
... and what a spider does is that it looks at a specified page, and finds all the links on it. It then follows all of those links (at least the ones local to the site), and downloads them as well. you keep repeating this process until you have all the pages (min you have to keep track of which pages you have to prevent an infinite loop). But there are already module for that (though sometimes its nice to know what your modules are doing). - Ant	[reply]