Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Foreign directory search with LWP part 2?

by Anonymous Monk
on Apr 27, 2001 at 13:00 UTC ( [id://76024]=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

so how does Yahoo or AltaVista do it?
  • Comment on Foreign directory search with LWP part 2?

Replies are listed 'Best First'.
Re: Foreign directory search with LWP part 2?
by larsen (Parson) on Apr 27, 2001 at 13:51 UTC
Re: Foreign directory search with LWP part 2?
by little (Curate) on Apr 27, 2001 at 13:25 UTC
    they use a porgram called spider that looks up for a file called "robots.txt" in the root dir of the specified (registered) domain. So they try to get "http://www.perlmonks.org/robots.txt") Inside this file is specified what the spider may read and what directories he might search. But still not. Acces must be granted.
    You can read about this very good on the special section of these search engines, where is described how such a file must and can look like and how it works.
    CPAN has it

    Have a nice day
    All decision is left to your taste
Re: Foreign directory search with LWP part 2?
by suaveant (Parson) on Apr 27, 2001 at 16:37 UTC
    ... and what a spider does is that it looks at a specified page, and finds all the links on it. It then follows all of those links (at least the ones local to the site), and downloads them as well. you keep repeating this process until you have all the pages (min you have to keep track of which pages you have to prevent an infinite loop). But there are already module for that (though sometimes its nice to know what your modules are doing).
                    - Ant

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://76024]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (7)
As of 2024-04-25 11:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found