http://qs321.pair.com?node_id=269283

John M. Dlugosz has asked for the wisdom of the Perl Monks concerning the following question:

In perlrun, it states that "The value 0777 will cause Perl to slurp files whole because there is no legal character with that value."

Well, with the advent of Unicode, there is indeed a character (octal)777, U+01FF, latin small letter o with stroke and acute.

So, does the -0 flag on the command line behave like \x or \x{} in strings? If the former, it doesn't let me set the input record separator to a sequence of bytes, which would be needed for a multi-byte character.

Files are open in 8-bit mode by default, for compatibility with older versions of Perl. But using one-liners with files opened automatically by while(<>) or piped in to standard input, how do I specify a different (e.g. UTF-8) encoding?

—John

Replies are listed 'Best First'.
Re: perlrun -0777 option
by fglock (Vicar) on Jun 26, 2003 at 19:08 UTC

    There is something in this thread:

    http://nntp.x.perl.org/group/perl.perl5.changes/7155

    Change 19185 by jhi@kosh on 2003/04/10 19:06:02 Noted by Nat: -0 didn't work that well with Unicode. ... +If you want to specify any Unicode character, use the hexadecimal +format: C<-0xHHH...>, where the C<H> are valid hexadecimal digits. +(This means that you cannot use the C<-x> with a directory name that +consists of hexadecimal digits.)
      That's very good to know; thanks for finding it.

      How do I submit a change to the perlrun.pod documentation, or tell the proper person about it?

      —John

        Actually, that text is from a perlrun.pod patch:

        ==== //depot/perl/pod/perlrun.pod#87 (text) ==== Index: perl/pod/perlrun.pod --- perl/pod/perlrun.pod#86~19181~ Thu Apr 10 01:02:10 2003 +++ perl/pod/perlrun.pod Thu Apr 10 12:06:02 2003 @@ -7,7 +7,7 @@
Re: perlrun -0777 option
by crazyinsomniac (Prior) on Jun 27, 2003 at 03:55 UTC
      perl -Mopen
      Thanks, that is nice and elegant (no new option necessary).

      —John

Re: perlrun -0777 option
by Anonymous Monk on Jun 26, 2003 at 16:28 UTC
    I've never dealt with Unicode so i can't help you there, but if you're looking to slurp a file into an array or scalar (something considered bad by most people, which i do a lot of ) why are you not using local?
    my $var = ''; # or perhaps my @var = (); { local $/ = undef; open(FH,"<myfilepath/andname") or die "A horrible death $!"; $var = <FH> ; close(FH); }
    Does unicode somehow thwart an undef'ed input record seperator ( $/ )?
      Because I want to do it with perl -pe. That is, the open, read, and print are all done in the stock set-up.
        So just do...
        perl -pe 'BEGIN { $/ = undef; } whatever'
        
      FYI... That was Me, forgot i logged out...