Re: truncate string to byte count

in reply to truncate string to byte count

Well, every Utf8-encoded character takes up 16 bits, so you just simply divide by 2 and make sure the result is an even number. If it is not, then subtract one, and then you have an index where it is safe to split the string. I don't understand why is this such a huge problem?

Comment on Re: truncate string to byte count

Replies are listed 'Best First'.
Re^2: truncate string to byte count by Your Mother (Archbishop) on Feb 28, 2019 at 01:55 UTC
Because that's not remotely right…? UTF-8 is a variable length encoding with a minimum of 8 bits per character. Characters with higher code points will take up to 32 bits.	[reply]
Re^2: truncate string to byte count by LanX (Saint) on Feb 28, 2019 at 11:50 UTC
> I don't understand why is this such a huge problem? The (text-)string commands in Perl operate on a character and not byte basis. A string carries an internal utf8 flag which determines how it's handled. Saying so, some commands like `unpack` or `vec` are supposed to operate on raw bit vectors and might be useful here. *) i.e. variable byte length character Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery FootballPerl is like chess, only without the dice}	[reply]
Re^2: truncate string to byte count by ikegami (Patriarch) on Feb 28, 2019 at 20:18 UTC
You might be thinking of UTF-16, but that's also wrong. A character encoded using UTF-16 results in 2 or 4 bytes depending on the character.	[reply]

In Section Seekers of Perl Wisdom