Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
i am trying to write a program that searches the documents in a foreign website simply because no body puts a search engine in their site themselves. i get how to search for words in documents in my own site, but how do i do it in someone elses? my dream is to be able to just put in; search:www.whatever.com for:perl, and then it will search all the documents in that directory for the word perl and then send back links. IS THAT TOO MUCH TO ASK?!?!?! : )
Re: search a foreign directory
by suaveant (Parson) on Apr 24, 2001 at 21:04 UTC
|
well... the easy way would be to use something like wget which
supports recursive downloads, then search it locally...
Or you could write a web spider of your own in perl using LWP
and search the pages each time (or make a local copy as with the wget).
Probably be a good idea to cache the pages locally for a while and search
them locally, then rebuild the link and go to the actual site.
of course, google and altavista have an option to search within
a specific domain, so if they are in there you could just use them :)
Update BTW, to use the domain searching in AV and google go to their Advanced Search pages
- Ant | [reply] |
|
You dont have to go to the advanced search page. Altavista supports (amongst others) these nice little shortcuts:
+host:domain.com - only search for results on this domain
+link:domain.com - only search for results that link to domain.com
The latter is useful when you want to see who's linked to your site :)
Just include them along with your search term to restrict. This means you can create a search of the site easily by auto populating a search box with the +host:domain.com string and allowing users to enter their term. Or use JavaScript to hide the term and present an empty search box.
cLive ;-)
| [reply] |