http://qs321.pair.com?node_id=416013

I was looking at the POD in XML/Fling.pm and thinking to myself… that really isn't such a great solution.

There's a new module on CPAN called XML::Genx, which is based on Genx, Tim Bray's library for generating XML.

Genx is a library, written in the C language, for generating XML. Its goals are high performance, a simple and intuitive API, and output that is guaranteed to be well-formed; […] Latest news: In production, carrying hundreds of thousands of subtitles per day; thinking of taking off the “beta” stamp.

Of course, the question is, how many places in the PM codebase use XML::Fling currently? If there are many, migrating directly will be a pain. However, it should be possible to rewrite XML::Fling as a thin wrapper around XML::Genx. Another simpler option might be to just start using them in parallel, going back through old tickers as tuits permit.

I'm making this proposition because I'm pretty sure that Genx will be both safer to use and faster than even Fling. I haven't benchmarked anything yet, but I wanted to know there's any interest before I sink any time into that.

Makeshifts last the longest.

Replies are listed 'Best First'.
Re: XML::Fling begone?
by Corion (Patriarch) on Dec 19, 2004 at 17:21 UTC

    One smaller thing is that XML::Genx is XS-based, which means we'll need a C compiler for all the platforms we intend to deploy on. Luckily, there are only two platforms, Linux/SuSE for pmdev and *BSD for the production machines. There will be problems if Pair upgrades the Perl version of the servers, as then XML::Genx will have to be (re)compiled in lockstep with the upgrade. But maybe we can somehow arrive in an arrangement with Pair regarding that.

    Adding XS code also means that we could run afoul of (more?) spurious segfaults, but if we restrict ourselves to one new XS module/one XS upgrade every month, the source of the segfaults should be fairly traceable.

      Well, by user reports, Genx itself is very stable, while the XS wrapper is much less tested. All this could be a consideration, but it seems to me more like minor issues as opposed to anything close to a showstopper.

      Makeshifts last the longest.

Re: XML::Fling begone? (ctrl, utf-8)
by tye (Sage) on Dec 19, 2004 at 19:24 UTC

    When you benchmark, be sure to time the building of a string to output as Genx won't have a handle to write to.

    One problem with XML 1.0 is that they made some stupid decisions with regard to control characters. This is likely fixed in the next version of the XML spec (which I assume is still not finished).

    In my experience, the majority of XML parsers are actually non-complient on this point (perhaps a form of civil disobedience or a subconscious revolt against a design misfeature?) so producing non-complient XML has a practical advantage for me. If Genx is complient on this point, then that will probably be too much thrash to be worth the minor benefit.

    When XML 1.1 becomes available, then the stupid design decision is restricted to nul characters, which is an acceptable compromise. Which means that using Genx and letting the user select which version of XML they want output would be great.

    Only being able to produce UTF-8 may have some interesting consequences. We have a hard time getting people to deal with encodings with XML correctly. The change will likely cause some disruption. It may ease some problems. For example, cbhistory still produces UTF-8 output but claims it is Latin-1 (because it feeds Latin-1 to its XML parser but the parser insists on producing UTF-8 output and the author didn't appreciate this fact). So such a change might fix this problem and/or may cause it to appear more places. I just mention this in hopes that this somewhat minor point will be properly addressed if a change is made.

    - tye        

      Please elaborate on control characters. I have a vague recollection of hearing something like that before but I can't pull out the specifics. And, handwaving the issue before I actually know what it is, is this something CDATA sections or entitification cannot fix in generally compatible fashion?

      Makeshifts last the longest.

        No, the XML 1.0 spec declares that non-whitespace control characters (with or without the eighth bit set) are illegal in XML and entities for illegal characters are illegal.

        I don't know it CDATA removes this restriction. I'd think it would but after being surprised by the control-character stupidity and seeing many XML near-experts also boggled by it, I won't speculate w/o reading the spec first.

        Of course, if you prefer the attribute-heavy style of XML, then CDATA won't be any help (I say w/o verifying this assumption but I'd nearly bet money on it).

        I feel PerlMonks' XML should be nearly or completely attribute-free. But that isn't much help since we already have a heavy base of ticker clients that don't handle CDATA.

        So when I said that control characters are a problem, I wasn't so XML-naive as to not have considered entities and CDATA.

        - tye        

Re: XML::Fling begone?
by demerphq (Chancellor) on Dec 19, 2004 at 17:38 UTC

    As an on-and-off effort ive been replacing all direct uses of to new_xml_fling. Hypothetically if the interfaces were compatible you could modify that and replace XML::Fling. Not sure how valuable that would be however. I think the speed gain would have to be quite signifigant to make it worth the effort.

    ---
    demerphq

      Note that I'm proposing this not solely for the speed gains, but also for the added safety afforded by Genx. It offers more of what Fling does well and less of its drawbacks.

      Makeshifts last the longest.

        Its just that XML::FLing is fairly pervasive is all. Also its interface quirks are definately exploited in the code base. Converting wont be straight-forward unfortunately.

        ---
        demerphq

Re: XML::Fling begone?
by BUU (Prior) on Jan 01, 2005 at 07:13 UTC
    Excuse me for interrupting this fairly serious conversation, but what the heck is XML::Fling? And where can I find it to read the documentation?

      See XML/Fling.pm and XML/Fling/Array.pm. It's the ultra-barebones XML generator module currently in use by PM (and E2?). Apparently it never made it out of the site, though it appears to have been intended for CPAN (at least the docs sound like it).

      Makeshifts last the longest.

        Permission Denied ( #2294=superdoc: print w/ replies, xml ) Need Help?? You don't have access to that node. Tough beans.
        :-( Got a link that we *can* view?

        --
        Linux, sci-fi, and Nat Torkington, all at Penguicon 3.0
        perl -e 'print(map(chr,(0x4a,0x41,0x50,0x48,0xa)))'