Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Looking for news crawler module

by nysus (Parson)
on Mar 06, 2003 at 19:33 UTC ( #240975=perlquestion: print w/replies, xml ) Need Help??

nysus has asked for the wisdom of the Perl Monks concerning the following question:

What modules are people using out there to crawl newspaper sites for articles containing specific keywords? I'm all ears. Thanks.

$PM = "Perl Monk's";
$MCF = "Most Clueless Friar Abbot Bishop";
$nysus = $PM . $MCF;
Click here if you love Perl Monks

Replies are listed 'Best First'.
Re: Looking for news crawler module
by Ovid (Cardinal) on Mar 06, 2003 at 19:43 UTC

    My first thought would be to check out the news related modules on the CPAN. Additionally, check out the RSS modules. RSS (RDF Site Summary) is a great tool and should help out. However, that requires that the sites you wish to crawl have RSS feeds.

    There are also the google-related modules. I'm particularly curious about the DBD::Google module. It might be perfect, but it looks new.


    New address of my CGI Course.
    Silence is Evil (feel free to copy and distribute widely - note copyright text)

Re: Looking for news crawler module
by cacharbe (Curate) on Mar 07, 2003 at 15:19 UTC

    The only caveat I would put out there for news crawlers is that if you are going to use Google as your source, please, please, please read their usage requirements. They are extremely touchy about automated processes using their data.

    And on that note, may I point you at a new book by O'Reilly


    Flex the Geek

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://240975]
Approved by zigdon
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (3)
As of 2021-11-28 05:34 GMT
Find Nodes?
    Voting Booth?

    No recent polls found