Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: Writing a simple RSS aggregator.

by Zaxo (Archbishop)
on Dec 06, 2003 at 02:43 UTC ( [id://312714]=note: print w/replies, xml ) Need Help??


in reply to Writing a simple RSS aggregator.

The xml in the feed is spread over several lines, but you're reading only one at a time. No one line matches all your regex.

Try setting local $/ = '</item>'; before reading. The alternative is to forget the intermediate file, rely on the linebreaks, and do global matching a la,

my $regex = /<title>(.*?)<\/title>\n<link>(.*?)<\/link>\n<description> +(.*?)\n<\/description>/; while ($data =~ /$regex/g) { #... }
That is pretty fragile, however. I suspect you're doing this as a favor and it seems odd that you have to rewrite the good xml modules to do it.

LWP::Simple is just as optional as the XML modules, which you should be able to use. There is even one for rss.

After Compline,
Zaxo

Replies are listed 'Best First'.
Re: Re: Writing a simple RSS aggregator.
by duff (Parson) on Dec 06, 2003 at 03:40 UTC

    Here are a couple of other methods that were inspired by Zaxo:

    Method #1

    #!/usr/bin/perl -w use strict; use LWP::Simple; use CGI qw( :standard ); require 5.8.0; print "Content-type: text/html\n\n"; print start_html; my $RSS = get("http://thraxil.org/rss"); { local $/ = "</item>"; open my $rss, "<", \$RSS or die "Aaiiigh - $!"; while (<$rss>) { my ($title) = m!<title>(.*?)</title>!is; my ($link) = m!<link>(.*?)</link>!is; my ($desc) = m!<description>(.*?)</description>!is; next unless $title && $link && $desc; print "Title: $title\nLink: $link\nDescription: $desc\n\n"; } close $rss; }

    Method #2

    #!/usr/bin/perl -w use strict; use LWP::Simple; use CGI qw( :standard ); print "Content-type: text/html\n\n"; print start_html; my $RSS = get("http://thraxil.org/rss"); my @items = $RSS =~ m!<item.*?>(.*?)</item>!gis; for (@items) { my ($title) = m!<title>(.*?)</title>!is; my ($link) = m!<link>(.*?)</link>!is; my ($desc) = m!<description>(.*?)</description>!is; next unless $title && $link && $desc; print "Title: $title\nLink: $link\nDescription: $desc\n\n"; }

    Each of these has its own merits, but if you want to do it right, use a real parser from CPAN. :-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://312714]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (7)
As of 2024-03-28 19:08 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found