Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine

Re: simple file question, extra spaces, win32

by SavannahLion (Pilgrim)
on Dec 29, 2003 at 18:51 UTC ( [id://317497] : note . print w/replies, xml ) Need Help??

in reply to simple file question, extra spaces, win32

The file is probably encoded as Unicode. I get similar behavior when I read from Unicode files and display the file contents in the Command box.

There's a Unicode module at CPAN you can examine. Though I have no direct experience with it.
The Camel book also mentions turning on Unicode and UTF-8 support, but I don't think I understand how it works in Perl since I haven't had much luck in getting it to work.

Is it fair to stick a link to my site here?

Thanks for you patience.

  • Comment on Re: simple file question, extra spaces, win32

Replies are listed 'Best First'.
Re: Re: simple file question, extra spaces, win32
by hardburn (Abbot) on Dec 29, 2003 at 18:55 UTC

    In the perl 5.6 series, you had to be more explicit about your use of Unicode. In the 5.8 series, perl did a better job of detecting Unicode automagically, so a use utf8; should only be necessary in very specific circumstances.

    I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
    -- Schemer

    : () { :|:& };:

    Note: All code is untested, unless otherwise stated

      Except this kind of file is not in UTF-8... Instead it's just two bytes per character, most likely in Little Endian form. I think the official name of this encoding is UCS-2.

      At worst, this can converted to UTF-8 by doing:

      $utf8 = pack 'U*', unpack 'v*', $unicode;
      Not fast, but it'll do the trick. If all you want is ISO-Latin-1, try
      $latin1 = pack 'C*', unpack 'v*', $unicode;

      p.s. Those aren't spaces, instead, most of them extra bytes will be chr(0).