Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked

Re^2: binary data in XML

by sailortailorson (Scribe)
on Feb 28, 2008 at 18:19 UTC ( #670963=note: print w/replies, xml ) Need Help??

in reply to Re: binary data in XML
in thread binary data in XML

There is a vanishingly small chance that the data contain ']]>'. In fact, this could be an issue, but I have only been working here for two days and I am sure that the first answer I get will be that it is so unlikely that it is considered impossible. But this data has a high sorrow factor if it is not handled correctly, so I will eventually bring that up.

I think that base 64 encoding is out of the question as it would be too slow.

In practical terms, getting the binary data wrapped in '<![CDATA...]>' is probably the best option. Maybe for now, I can do that myself in preprocessing and get some traction on parsing these that way.

Thank you.

Replies are listed 'Best First'.
Re^3: binary data in XML (CDATA, ha!)
by tye (Sage) on Feb 28, 2008 at 18:40 UTC

    CDATA doesn't, in fact, make the least bit of difference as to what characters you can include in the data. You can't put (unencoded) binary data into XML using CDATA. The only difference between non-CDATA and CDATA is that one requires you to encode some single-character items while the other requires you to encode one 3-character sequence. This makes CDATA quite silly, IMHO.

    And, no, you can't even use &#12; to get "binary" characters into XML.

    - tye        

      Perhaps I was a little unclear, so for the benefit of the slow, I was offering two alternatives:

      • Use CDATA if you've got character data that otherwise would be interpreted as markup and don't mind the onerous task of encoding the commonly appearing sequence "]]>" instead of encoding every other offending single character individually (say you had Perl code which was otherwise free of verboten characters but chock full of > and < etc)
      • Use an out-of-band encoding such as base64 which is handled at the application layer rather than by the XML parser itself for what would otherwise be invalid character data (a sequence of octets which would be outside XML's allowed range when interpreted)

      Given that there was no example of what exactly the offending "binary data" was I thought it best to offer both options: the simple for more vanilla ASCII-y data and encoding for arbitrary octet streams.

      Update: Further clarified what types of data suggest which alternative.

      The cake is a lie.
      The cake is a lie.
      The cake is a lie.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://670963]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (1)
As of 2023-10-03 00:35 GMT
Find Nodes?
    Voting Booth?

    No recent polls found