http://qs321.pair.com?node_id=240975

nysus has asked for the wisdom of the Perl Monks concerning the following question:

What modules are people using out there to crawl newspaper sites for articles containing specific keywords? I'm all ears. Thanks.

$PM = "Perl Monk's";
$MCF = "Most Clueless Friar Abbot Bishop";
$nysus = $PM . $MCF;
Click here if you love Perl Monks

Replies are listed 'Best First'.
Re: Looking for news crawler module
by Ovid (Cardinal) on Mar 06, 2003 at 19:43 UTC

    My first thought would be to check out the news related modules on the CPAN. Additionally, check out the RSS modules. RSS (RDF Site Summary) is a great tool and should help out. However, that requires that the sites you wish to crawl have RSS feeds.

    There are also the google-related modules. I'm particularly curious about the DBD::Google module. It might be perfect, but it looks new.

    Cheers,
    Ovid

    New address of my CGI Course.
    Silence is Evil (feel free to copy and distribute widely - note copyright text)

Re: Looking for news crawler module
by cacharbe (Curate) on Mar 07, 2003 at 15:19 UTC

    The only caveat I would put out there for news crawlers is that if you are going to use Google as your source, please, please, please read their usage requirements. They are extremely touchy about automated processes using their data.

    And on that note, may I point you at a new book by O'Reilly

    C-.

    ---
    Flex the Geek