http://qs321.pair.com?node_id=851806


in reply to Re^3: passing data structures from java to perl
in thread passing data structures from java to perl

Java strings are UTF-16

The way strings are stored internally doesn't really matter.

While Perl stores unicode strings internally as UTF-8 (or something very close to it), it can encode those strings to many other encodings for output.  The same holds for Java: while it stores strings internally as UTF-16, there's no problem creating UTF-8 output, for example.

Writer utf8out = new BufferedWriter( new OutputStreamWriter( new FileOutputStream("outfile"), "UTF-8" ) ); utf8out.write("some unicode data");

Replies are listed 'Best First'.
Re^5: passing data structures from java to perl
by exussum0 (Vicar) on Jul 28, 2010 at 23:25 UTC
    That's really really bad advice. You can get encoding errors like that on the java side. Yeah, the bytes in memory are all that matter and you're fine to interpret, but on the way in and out, you're playing with things.

    It's the same thing like binmode. You're affecting the data as the IO occurs to get into our out of memory. See...

    http://www.docjar.com/docs/api/java/nio/charset/UnmappableCharacterException.html

    I've run into this exact problem using the XML feeds for perlmonks while working in java.

      Not sure what you're talking about.  UTF-8 is a variable-width (multi-byte-if-needed) encoding that can encode the full unicode character set, so why should there be encoding errors, or an UnmappableCharacterException?  An UnmappableCharacterException is thrown if a certain character can't be represented in the specified target encoding, but as all unicode characters can be encoded in UTF-8, this exception cannot occur.

      UnmappableCharacterExceptions may happen if you try to encode unicode data to Latin-1, for example, but not with UTF-8.

      A reply falls below the community's threshold of quality. You may see it by logging in.
      That's really really bad advice. You can get encoding errors like that on the java side.

      How do you figure, please explain?

Re^5: passing data structures from java to perl
by Jenda (Abbot) on Jul 31, 2010 at 22:47 UTC

    Just an OT question ... do all these writers in writers in writers look as crazy to you as they do to me? I mean I have no experience with Java (nor do I want any), but it seems IO in Java is just as overcomplicated as in C#/.Net. There probably is a reason that I, being OO unclean, do not see, but still. Looks like someone went heavily overboard when designing the libraries.

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.

      Nope, they're building blocks you can use to create any kind of api you can imagine, even perl style layers ":raw:crlf:utf-8"

        OK, building blocks. I buy that. But why the hell are we supposed to use building blocks instead of a sane API?

        Sorry for going further OT. Though ... this is not really a Java or C# specific question. It's the question of library interfaces and now with Perl6 object system I am afraid we run the risk of going too puristicaly object oriented. Probably not with file system IO, but with other libraries. Building blocks (= implementation detail) leaking to the interface.

        Jenda
        Enoch was right!
        Enjoy the last years of Rome.