Writing a simple RSS feed 'grabber' with XML::Parser.

DigitalKitty has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Writing a simple RSS feed 'grabber' with XML::Parser. by ajt (Prior) on Oct 20, 2004 at 11:02 UTC
I wouldn't start any project with XML::Parser, it's a bit antique, and XML::LibXML is more feature rich and much faster parser to start any project with. Parsing RSS is a real pain, as it's often not well formed, so anything using a proper XML parser will die. XML::RSS and XML::RSS::Tools get round this by having a pre-filter in them that cleans up well know bad code, before attempting to pass the file onto the XML parser. The XML::RSS::Tools module (which I wrote) uses XML::RSS for parsing RSS, one of several HTTP clients for getting RSS feeds, and the XML::LibXSLT module for converting the feed into something else. Some useful nodes: Converting RSS file to HTML XML::RSS Using RSS How do I clean RSS feeds to make them usable? Parsing badly formed RSS or XML Parsing XML At All Costs -- ajt	[reply]
Re^2: Writing a simple RSS feed 'grabber' with XML::Parser. by Anonymous Monk on Apr 14, 2007 at 23:22 UTC
rss feed grabber	[reply]
Re: Writing a simple RSS feed 'grabber' with XML::Parser. by demerphq (Chancellor) on Oct 20, 2004 at 10:04 UTC
Hi DK. I'm trying to figure out what your objective here is. Are you trying to learn how XML::Parser works? Or are you trying to do something with RSS? I mean, if its the latter then I would do it like this: `#!/usr/bin/perl -w use strict; use XML::Simple; use LWP::Simple; use Data::Dump::Streamer; $\|++; my $ticker=['http://perlmonks.org/index.pl?node_id=30175&xmlstyle=rss' +, "http://rss.news.yahoo.com/rss/science"]->[rand 2]; print "Getting RSS from $ticker\n"; my $feed = get($ticker); print "Parsing RSS...\n"; my $ref = XMLin($feed); print "Dumping Parse Tree...\n"; Dump $ref;` [download] If its the former then I can't really help much beyond pointing out that what you are doing with the lexical var "$feed" in there scares the willies out of me. --- demerphq _{First they ignore you, then they laugh at you, then they fight you, then you win. -- Gandhi Flux8}	[reply] [d/l]
•Re^2: Writing a simple RSS feed 'grabber' with XML::Parser. by merlyn (Sage) on Oct 20, 2004 at 10:11 UTC
And in today's "making Perl work a lot harder than you need to do", let's nominate the following entry: `my $ticker=['http://perlmonks.org/index.pl?node_id=30175&xmlstyle=rss' +, "http://rss.news.yahoo.com/rss/science"]->[rand 2];` [download] So, we've asked Perl to construct an array, take a reference to it, then dereference that reference to pick out one of the items, then discard the reference, which then garbage-collects the array. All when we could have written that this way: `my $ticker=('http://perlmonks.org/index.pl?node_id=30175&xmlstyle=rss' +, "http://rss.news.yahoo.com/rss/science")[rand 2];` [download] saving two characters of typing, and all that mess of creating the new array and reference and garbage collecting. We're simply constructing a list, then picking out an element of that list with a literal slice (a construct I suggested for Perl 3, by the way {grin}). To optimize this further, I'd go with a qw for that first list: `my $ticker=(qw(http://perlmonks.org/index.pl?node_id=30175&xmlstyle=rs +s http://rss.news.yahoo.com/rss/science))[rand 2];` [download] And in recent versions of Perl, you can even drop that outer set of parens: `my $ticker=qw(http://perlmonks.org/index.pl?node_id=30175&xmlstyle=rss http://rss.news.yahoo.com/rss/science)[rand 2];` [download] I saw bracket-arrow-bracket as a "cute syntax" once. I'm trying to stomp it out, because there's an equivalent construct (as I showed) that is a lot less work for Perl. Please don't propogate "cute syntax" that is more expensive. -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply.	[reply] [d/l] [select]
Re^3: Writing a simple RSS feed 'grabber' with XML::Parser. by Juerd (Abbot) on Oct 20, 2004 at 10:47 UTC
`perl -MBenchmark=cmpthese -e'cmpthese -1, { cute => sub { [0,1]->[rand + 2] }, list => sub { qw(0 1)[rand 2] } }'` [download] `Rate cute list cute 768000/s -- -74% list 2899719/s 278% --` [download] Yes, the list slice is much faster. But for something that probably runs only once in every 5 minutes, isn't 768000 per second fast enough? Optimizing seems premature here. For things like this, I am against choosing a particular language or syntax for its speed. I'm not saying that your reply is useless. It's important to know what code does and this information will certainly help some of the readers when they do have to optimize. But the code is written now and not much is gained by changing it, so I'd just let it be. Programmer time is still much more expensive than computer time. Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }	[reply] [d/l] [select]
•Re^4: Writing a simple RSS feed 'grabber' with XML::Parser. by merlyn (Sage) on Oct 20, 2004 at 11:04 UTC
Re^3: Writing a simple RSS feed 'grabber' with XML::Parser. by demerphq (Chancellor) on Oct 20, 2004 at 10:51 UTC
Yep, guilty as charged. But lighten up a little. It was just a snippet to make things a little more interesting. --- demerphq _{First they ignore you, then they laugh at you, then they fight you, then you win. -- Gandhi Flux8}	[reply]
Re^3: Writing a simple RSS feed 'grabber' with XML::Parser. by Anonymous Monk on Oct 20, 2004 at 11:27 UTC
at least you have your disclaimer to guard against anyone thinking that you're an ass...	[reply]
Re: Writing a simple RSS feed 'grabber' with XML::Parser. by gellyfish (Monsignor) on Oct 20, 2004 at 08:33 UTC
The XML is not properly formed - probably a missing closing tag of an element. It would probably help if you could post the source of your RSS. /J\	[reply]
Re: Writing a simple RSS feed 'grabber' with XML::Parser. (detailed review) by demerphq (Chancellor) on Oct 20, 2004 at 18:40 UTC
bobf asked me to expand on my scary comment in my original reply. Here goes. Read more... (6 kB) --- demerphq _{First they ignore you, then they laugh at you, then they fight you, then you win. -- Gandhi Flux8}	[reply] [d/l] [select]
Re: Writing a simple RSS feed 'grabber' with XML::Parser. by inman (Curate) on Oct 20, 2004 at 10:33 UTC
I have use XML::RSSLite with some success. The following is a simple CGI that displays an RSS feed as a web page. Read more... (2 kB)	[reply] [d/l]
Re: Writing a simple RSS feed 'grabber' with XML::Parser. by Anonymous Monk on Apr 15, 2007 at 01:31 UTC
Try XML::RAI `#!/usr/bin/perl -w use strict; use LWP::Simple 'get'; use XML::RAI; my $rss = XML::RAI->parse(get(shift\|\|die"please enter rss uri")); my $title = $rss->channel->title; my $link = $rss->channel->link; print "$title\n$link\n\n"; for my $item (@{$rss->items}) { $title = $item->title; $link = $item->link; print "$title\n$link\n"; }` [download] Also see XML::RSS::SimpleGen	[reply] [d/l]
Re: Writing a simple RSS feed 'grabber' with XML::Parser. by Your Mother (Archbishop) on Jan 25, 2009 at 05:51 UTC
Check out XML::Feed if you haven't. I've been really happy with it.	[reply]
Re: Writing a simple RSS feed 'grabber' with XML::Parser. by Plankton (Vicar) on Jan 25, 2009 at 05:46 UTC
I tried XML::RSS too found it to be over kill for what I wanted so I tried XML::RSS::Parser::Lite and was happy with it until I hit "CDATA" as many other monks have pointed out RSS feeds are not always "well formed" so I just ended up doing something like this ... ... use WWW::Mechanize; my $url = shift; # any .xml RSS feed url my $mech = WWW::Mechanize->new(); $mech->get( $url ); my @content = split /\n/, $mech->content; my $title_pattern = "<title>(.?)</title>"; my $description_pattern = "<description>(.?)</description>"; my @titletags = grep s/$title_pattern/$1/i, @content; my @descriptiontags = grep s/$description_pattern/$1/i, @content; my $thetitle=$titletags[0]; if ( $thetitle !~ s/<\!\[CDATA\[//g ) {} if ( $thetitle !~ s/Librivox\://g ) {} if ( $thetitle !~ s/]]>//g ) {} print "$thetitle\n"; my $thedescription=$descriptiontags[0]; if ($thedescription !~ s/<\!\[CDATA\[//g ) {} if ($thedescription !~ s/]]>//g ) {} print "$thedescription\n"; [download] Not the best general solution but it worked for me in my particular case.	[reply] [d/l]


P is for Practical
	PerlMonks