Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re^2: Unicode perlio error (when multibyte UTF-8 characters are split across block boundaries-- is it a perl bug or an I'm stupid bug?

by DrWhy (Chaplain)
on Dec 11, 2013 at 20:56 UTC ( [id://1066722]=note: print w/replies, xml ) Need Help??


in reply to Re: Unicode perlio error (when multibyte UTF-8 characters are split across block boundaries-- is it a perl bug or an I'm stupid bug?
in thread Unicode perlio error (when multibyte UTF-8 characters are split across block boundaries-- is it a perl bug or an I'm stupid bug?

I believe all that does is disable the error detection and reporting. It does nothing to fix the underlying issue, it just hides it.

--DrWhy

"If God had meant for us to think for ourselves he would have given us brains. Oh, wait..."

  • Comment on Re^2: Unicode perlio error (when multibyte UTF-8 characters are split across block boundaries-- is it a perl bug or an I'm stupid bug?

Replies are listed 'Best First'.
Re^3: Unicode perlio error (when multibyte UTF-8 characters are split across block boundaries-- is it a perl bug or an I'm stupid bug? (source)
by tye (Sage) on Dec 11, 2013 at 22:03 UTC

    It indeed appears to be a bug in PerlIO::encoding. See http://cpansearch.perl.org/src/RJBS/perl-5.18.1/ext/PerlIO-encoding/encoding.xs to note that "encode" and "decode" methods are simply called with a buffer full of bytes with no attempts to handle incomplete multi-byte characters across buffer boundaries.

    To fix this efficiently, you'd want Encode's encode() and decode() methods (or similar) to support "tell me how many bytes on the end to save for later as they could be incomplete multi-byte characters in the desired encoding". Ah, I see FB_QUIET is already there for just that purpose. Unfortunately, using that completely defeats the purpose of allowing options like FB_WARN and FB_CROAK. Plus I don't see how that code makes it reasonable to detect invalid characters instead of just ending up in an endless loop of converting 0 bytes over and over.

    It would be helpful for something similar to FB_QUIET to be defined as a bit that can be combined with FB_WARN or FB_CROAK such that failing at the first byte or (better) too far before the end of the buffer triggers the warn/croak but failing with a single, incomplete fragment of a multi-byte character on the end of the buffer acts like FB_QUIET would.

    But surely that's already plenty of information for you to file the bug report, eh?

    - tye        

Re^3: Unicode perlio error (when multibyte UTF-8 characters are split across block boundaries-- is it a perl bug or an I'm stupid bug?
by choroba (Cardinal) on Dec 11, 2013 at 21:38 UTC
    Please, demonstrate the error. The output of the two versions of the script (with PerlIO::encoding and without it) is exactly the same.
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      The error is already demonstrated in the OP. The error is not in the output, but in the fact that in the second case an error (warning) message is emitted when there is no error to report. The fact that there is no error to report is demonstrated by the fact that the output of the two cases is identical (nearly, the second output has an additional character at the beginning) and neither are garbled or erroneous in anyway.

      --DrWhy

      "If God had meant for us to think for ourselves he would have given us brains. Oh, wait..."

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1066722]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (5)
As of 2024-04-19 07:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found