Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

A "binary" file for us:

C:\>perl -e "print qq(\xB5)" > data.bin

And:

use strict; use warnings; use feature 'say'; use Encode qw/ _utf8_off _utf8_on is_utf8 /; use utf8; use Devel::Peek; my $s1 = ' '; # a space (anything) _utf8_on( $s1 ); # or assign not-ascii above, instead my $s2 = $s1; open my $fh, '<', 'data.bin'; binmode $fh; sysread $fh, $s1, 1; Dump $s1; seek $fh, 0, 0; $s2 = do { local $/; <$fh> }; Dump $s2;
SV = PVMG(0xc149ec) at 0xc20dec REFCNT = 1 FLAGS = (PADMY,SMG,POK,pPOK,UTF8) IV = 0 NV = 0 PV = 0xc15a1c "\302\265"\0 [UTF8 "\x{b5}"] CUR = 2 LEN = 10 MAGIC = 0xc13ffc MG_VIRTUAL = &PL_vtbl_utf8 MG_TYPE = PERL_MAGIC_utf8(w) MG_LEN = -1 SV = PV(0x3f9f6c) at 0xc20f0c REFCNT = 1 FLAGS = (PADMY,POK,pPOK) PV = 0xc2e6a4 "\265"\0 CUR = 1 LEN = 10

Not sure if it's a bug or not.

Note that if the filehandle has been marked as :utf8 , Unicode characters are read instead of bytes (the LENGTH, OFFSET, and the return value of sysread are in Unicode characters)

Does this imply, that if FH has not been marked, OFFSET is treated as bytes? Then, possibly, utf8 becomes invalid?

I think that if OFFSET was 0, then string utf8-ness should match file's IO encoding layer. I.e., read produces same result as slurping, above. Regardless of content of original scalar. And, if OFFSET was not zero, then? It should be documented more clearly, perhaps. About combinations that should never be used.

BTW, it looks like it's about this bug. Tk passes file name as utf8, this parameter is (rather recklessly) re-used (!) to receive file content.


In reply to Read (sysread) binary data into utf8 string by vr

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others browsing the Monastery: (3)
    As of 2020-11-24 04:51 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      No recent polls found

      Notices?