Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re: Writing a simple RSS feed 'grabber' with XML::Parser.

by Plankton (Vicar)
on Jan 25, 2009 at 05:46 UTC ( [id://738752]=note: print w/replies, xml ) Need Help??


in reply to Writing a simple RSS feed 'grabber' with XML::Parser.

I tried XML::RSS too found it to be over kill for what I wanted so I tried XML::RSS::Parser::Lite and was happy with it until I hit "CDATA" as many other monks have pointed out RSS feeds are not always "well formed" so I just ended up doing something like this ...
... use WWW::Mechanize; my $url = shift; # any .xml RSS feed url my $mech = WWW::Mechanize->new(); $mech->get( $url ); my @content = split /\n/, $mech->content; my $title_pattern = "<title>(.*?)</title>"; my $description_pattern = "<description>(.*?)</description>"; my @titletags = grep s/$title_pattern/$1/i, @content; my @descriptiontags = grep s/$description_pattern/$1/i, @content; my $thetitle=$titletags[0]; if ( $thetitle !~ s/<\!\[CDATA\[//g ) {} if ( $thetitle !~ s/Librivox\://g ) {} if ( $thetitle !~ s/]]>//g ) {} print "$thetitle\n"; my $thedescription=$descriptiontags[0]; if ($thedescription !~ s/<\!\[CDATA\[//g ) {} if ($thedescription !~ s/]]>//g ) {} print "$thedescription\n";
Not the best general solution but it worked for me in my particular case.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://738752]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (3)
As of 2024-04-19 23:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found