http://qs321.pair.com?node_id=851793


in reply to Re^2: passing data structures from java to perl
in thread passing data structures from java to perl

Agreed. As long as no UTF-16 chars are used, which is really easy to do in java since strings are all UTF-16, it's all gravy.

Java strings are UTF-16

A good XML writer that prevents you from going outside of the declared format will protect you from future mistakes. Down side is the extra effort in using some XML api over others. JSON is usually gravy in all languages.

Gravy.. mmmm...

Replies are listed 'Best First'.
Re^4: passing data structures from java to perl
by almut (Canon) on Jul 28, 2010 at 21:05 UTC
    Java strings are UTF-16

    The way strings are stored internally doesn't really matter.

    While Perl stores unicode strings internally as UTF-8 (or something very close to it), it can encode those strings to many other encodings for output.  The same holds for Java: while it stores strings internally as UTF-16, there's no problem creating UTF-8 output, for example.

    Writer utf8out = new BufferedWriter( new OutputStreamWriter( new FileOutputStream("outfile"), "UTF-8" ) ); utf8out.write("some unicode data");
      That's really really bad advice. You can get encoding errors like that on the java side. Yeah, the bytes in memory are all that matter and you're fine to interpret, but on the way in and out, you're playing with things.

      It's the same thing like binmode. You're affecting the data as the IO occurs to get into our out of memory. See...

      http://www.docjar.com/docs/api/java/nio/charset/UnmappableCharacterException.html

      I've run into this exact problem using the XML feeds for perlmonks while working in java.

        Not sure what you're talking about.  UTF-8 is a variable-width (multi-byte-if-needed) encoding that can encode the full unicode character set, so why should there be encoding errors, or an UnmappableCharacterException?  An UnmappableCharacterException is thrown if a certain character can't be represented in the specified target encoding, but as all unicode characters can be encoded in UTF-8, this exception cannot occur.

        UnmappableCharacterExceptions may happen if you try to encode unicode data to Latin-1, for example, but not with UTF-8.

        A reply falls below the community's threshold of quality. You may see it by logging in.
        That's really really bad advice. You can get encoding errors like that on the java side.

        How do you figure, please explain?

      Just an OT question ... do all these writers in writers in writers look as crazy to you as they do to me? I mean I have no experience with Java (nor do I want any), but it seems IO in Java is just as overcomplicated as in C#/.Net. There probably is a reason that I, being OO unclean, do not see, but still. Looks like someone went heavily overboard when designing the libraries.

      Jenda
      Enoch was right!
      Enjoy the last years of Rome.

        Nope, they're building blocks you can use to create any kind of api you can imagine, even perl style layers ":raw:crlf:utf-8"