Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: XML::RSS

by ajt (Prior)
on Mar 24, 2003 at 21:40 UTC ( [id://245550]=note: print w/replies, xml ) Need Help??


in reply to XML::RSS

alexg,

Malformed XML is the bane of RSS. According to Mark Pilgrim about 10% of typical RSS feeds are malformed*, indeed the UK IT publication The Register has usable XML for only a few days in a given month.

You will find a wide range of problems that will cause XML::Parser the core of XML::RSS to explode:

  • Data encoded in one format, but declared in another (or in default utf-8).
  • Junk before the start XML declaration, the CMS Vignette tends to do this, and it's popular with big companies.
  • Badly nested tags, the CMS is sloppy at non-well formness checking, so it comes out and goes into the RSS feed broken.
  • Inproperly escaped ampersands and entities are a very common problem too.

In this node "How do I clean RSS feeds to make them usable?", Matts suggested his rssmirror, the guts of which are now included in both XML::RSS and XML::RSS::Tools.

I became so annoyed with bad XML in RSS feeds that I wrote XML::RSS::Tools to deal with the problems I found, which led to brian d foy taking over XML::RSS fixing a lot of it's problems, and with time designing a whole new version.

See also:

Good Luck!

* Parsing RSS At All Costs


--
ajt

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://245550]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2024-03-29 13:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found