http://qs321.pair.com?node_id=671058


in reply to binary data in XML

CDATA sections originate from a related SGML concept and are used for verbatim text (like code).

They are simply a way to stop any special entity-like processing and are certainly *not* meant for binary data. They are many schemes for binary XML.

If the document is valid UTF-8, then base64-enc of values is useful and does not need escaping as quotes are not part of the image set. Obviously you could also use directly the encoding data in a CDATA section but it seems less useful than:

<data local_enc="base64"> <value>c3RlcGhhbgo=</value> </data>
The whole arsenal of MIME/PEM conversions can be used. But be careful, there is one pitfall: the scheme *breaks* if the xml document uses an (at least 2 bytes) encoding like UTF-16 as a base64 sequence of bytes is not valid UTF-16. The solution is to use an extra conversion like  iconv -f ISO-88859-1 -t UTF-16 and its inverse.

Finally one way to encode  ]]> would be to close the CDATA section after outputting ]] and then open another one starting with  >

cheers --stephan