Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Parsing multiple RSS files

by perl.j (Pilgrim)
on Oct 23, 2013 at 20:58 UTC ( [id://1059348]=perlquestion: print w/replies, xml ) Need Help??

perl.j has asked for the wisdom of the Perl Monks concerning the following question:

Hey Everyone!

I'm trying to hack together a little tool to help me parse RSS feeds. Basically, the code I currently have takes a keyword(s) and prints the articles that have that word in the title. Here is the code:

use 5.14.2; use strict; use warnings; use XML::RSSLite; use LWP::Simple; my @keywords = qw(approach); my $URL = 'http://www.theguardian.com/theguardian/mainsection/rss'; my $content = get($URL); my %result; parseRSS(\%result, \$content); my $re = join "|", @keywords; $re = qr/\b(?:$re)\b/i; foreach my $item (@{ $result{items} }) { my $title = $item->{title}; $title =~ s{\s+}{ }; $title =~ s{^\s+}{ }; $title =~ s{\s+$}{ }; if ($title =~ /$re/) { print "$title\n\t$item->{link}\n\n"; } }

This gives me the desired effect with one url, but I need to parse ~20 of these and print the articles from all of them.

I attempted to do this by turning $URL into an array (by making it @URL), and changed that variable throughout the code, but that just gave me several errors.

So, my question is, how can I parse multiple RSS feeds in one script and have all of the output formatted the same way into the same file?

--perl.j

Replies are listed 'Best First'.
Re: Parsing multiple RSS files
by jethro (Monsignor) on Oct 23, 2013 at 21:25 UTC

    The get() function of LWP::Simple does not know how to work with arrays, you need to give it URLs one by one. This is done with a loop. That would look like this:

    foreach my $URL (@URL) { ... }
Re: Parsing multiple RSS files
by atcroft (Abbot) on Oct 23, 2013 at 21:19 UTC

    Would this (untested!) not work?

    use 5.14.2; use strict; use warnings; use XML::RSSLite; use LWP::Simple; my @keywords = qw(approach); my @URLlist = ( 'http://www.theguardian.com/theguardian/mainsection/rss', 'http://www.theguardian.com/theguardian/mainsection/rss1', ); foreach my $URL ( @URLlist ) { my $content = get($URL); my %result; parseRSS(\%result, \$content); my $re = join "|", @keywords; $re = qr/\b(?:$re)\b/i; foreach my $item (@{ $result{items} }) { my $title = $item->{title}; $title =~ s{\s+}{ }; $title =~ s{^\s+}{ }; $title =~ s{\s+$}{ }; if ($title =~ /$re/) { print "$title\n\t$item->{link}\n\n"; } } }

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1059348]
Approved by keszler
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (6)
As of 2024-04-19 04:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found