Pyxie is an alternative way of representing XML datas. These
datas are represented in a really simple way, one information
per line.
The nice thing about PYX is the ease of parsing the informations
you get, on the other hand, there are a lot of features found
in the XML format that can't be representated by PYX (CDATA,
entities,...)
Now, I know the module XML::PYX exists, and it
even comes with a script called pyxhtml, which does pretty
much what this code does. But XML::PYX per se
isn't really flexible if you want a finer control over what's
being kept or not in the HTML file.
Hopefully, this code can be easily customized to suit your
needs, provided you know how to use HTML::Parser (which is
really fun to use, especially the v.3).
And the really cool thing is that your HTML doesn't have
to be a valid XML file! (I wouldn't try to feed it Word 2000
pseudo-HTML though...)
More infos on PYX |