Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Re: binary data in XML (semantics)

by tye (Sage)
on Feb 28, 2008 at 18:33 UTC ( #670970=note: print w/replies, xml ) Need Help??

in reply to binary data in XML

The designers of XML have saved you from yourself. You aren't even allowed to send formfeed in XML so you are just crazy thinking XML would allow something so insane as sending binary data! Be glad the XML designers had your best interests in mind! If not for their keen insight and concern, you'd be sending binary data already and boy would you soon regret it!

As I note in Re: Funny characters in nodes (exactly zero), Tim Bray declared "XML dislikes [...] form-feed[s] [etc.] which have exactly zero shared semantics from system to system". Yes you'll never find two systems in the world that both use "form feed" to represent a page break.

So you need to either invent your own, proprietary encoding for the binary data and encode the binary data into XML-approved characters (to ensure "shared semantics", oh the irony) and then teach every party involved this new proprietary encoding. Or, you could just find one of the many "XML parsers" (the scare quotes are required by the XML standard) that have the good sense to at least optionally ignore the requirements that they complain about characters that Tim Bray dislikes (something that XML 1.1 will also likely mostly do).

If you can't find such an "XML parser", then you could also just use a simplistic scheme to transform the "not well-formed 'XML'" into XML and then transform all parsed-out values to recover the original binary data. For example, replace any control characters (or other XML-hated characters) and any backslashes with \xx where "xx" is the hex value of the byte (I don't think there are any Unicode characters that XML hates that won't fit in one byte) and then perform the reverse translation on the extracted values.

- tye        

Replies are listed 'Best First'.
Re^2: binary data in XML (semantics)
by sailortailorson (Scribe) on Feb 28, 2008 at 19:32 UTC
    Perhaps I am crazy, but I am merely asking about a practice that is already in place where I have just started a new job.

    Perhaps you mean "they" are crazy. Actually, I might be able to agree with that statement, but it would be out of the scope of any engineering approach to answering this question. Perhaps, by 'you are ... crazy', you simply mean 'not me' (from your point of view, of course). If, by 'you' you simply mean "someone other than myself", then OK, that's, well, odd, and random, but we can go on with part of this discussion that actually addresses the real problem.

    Anyway, will I still be crazy (by your definition) if I gather the following from what you say?

    To wit:

    That the practice in place here of using binary data inline in XML is unusual and deprecated or at least that my new coworkers have probably implemented a system that offends Tim Bray.

    That it is not likely remedied by use of <![CDATA...>.

    That in effect, hex encoding is the only way to use XML::Twig.


    that maybe I ought to use HTML::TreeBuilder::XPath, or some other such library that does not kill itself when it sees XML that does not strictly conform to the standard?

      HTML::TreeBuilder::XPath might be what you're looking for. Be aware of 2 things though: it loads the entire document in memory. Then its XML export method (as_XML, inherited from HTML::TreeBuilder) does not care about encoding at all, so it might very well produce non-well-formed XML. Which is probably what you want, come to think about it.

        Thanks Mirod.

        Thanks for the XML tools you wrote/improved too. They have made my life orders of magnitude easier (especially being able to use XPath where I was working just before now - I got to see my kids and wife more often and for longer. )

        I am no expert in XML (you can probably tell), but I have been able to find my way along.

        Thanks all.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://670970]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (5)
As of 2023-09-21 13:13 GMT
Find Nodes?
    Voting Booth?

    No recent polls found