Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

Re^3: incorrect length of strings with diphthongs

by LanX (Saint)
on Aug 30, 2022 at 17:34 UTC ( [id://11146505]=note: print w/replies, xml ) Need Help??


in reply to Re^2: incorrect length of strings with diphthongs
in thread incorrect length of strings with diphthongs

Yes, I'd say it's similar with the "ethnic" modifiers of face emojis.

But my expectation is that those modifiers don't count as character and have length 0, i.e. "Hütte" should have length 5 in both incarnations.

> how length() is implemented?

I may be wrong tho...

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery

  • Comment on Re^3: incorrect length of strings with diphthongs

Replies are listed 'Best First'.
Re^4: incorrect length of strings with diphthongs
by choroba (Cardinal) on Aug 30, 2022 at 17:48 UTC
    #!/usr/bin/perl use strict; use feature qw{ say }; use warnings; use Unicode::Normalize qw{ normalize }; use Unicode::GCString; my $char = "\N{LATIN SMALL LETTER U WITH DIAERESIS}"; binmode *STDOUT, ':encoding(UTF-8)'; for (qw( D C )) { my $n = normalize($_, $char); my $gcs = 'Unicode::GCString'->new($n); say join ' ', length($n), $n =~ s/(\X)/$1/g, $1, $gcs->chars, $gcs->columns, $gcs->length; }
    2 1 ü 2 1 1
    1 1 ü 1 1 1
    

    Update: Added the output.

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
      Interesting, looks like code.

      I might even be able to install those modules and try to understand the output you didn't provide (yet)!

      ;-P

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

        Check the update :-)

        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re^4: incorrect length of strings with diphthongs
by LanX (Saint) on Aug 30, 2022 at 20:04 UTC
    > I may be wrong tho...

    I certainly am...

    #!/usr/bin/perl use v5.12; use strict; use utf8; use Devel::Peek; my $trema = "\N{COMBINING DIAERESIS}"; binmode *STDOUT, ':encoding(UTF-8)'; my $huette = "Hu${trema}tte"; Dump $huette; say "$huette\'s length: ". length($huette);

    SV = PV(0x25f4a58) at 0x25266b8 REFCNT = 1 FLAGS = (POK,pPOK,UTF8) PV = 0x28da368 "Hu\314\210tte"\0 [UTF8 "Hu\x{308}tte"] CUR = 7 LEN = 10 Hütte's length: 6
    That's how it looks like without codetags:

    Hütte's length: 6

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11146505]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (4)
As of 2024-04-25 13:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found