Re^2: Unicode vulgar fraction composition

Replies are listed 'Best First'.
Re^3: Unicode vulgar fraction composition by ikegami (Patriarch) on Sep 28, 2020 at 02:16 UTC
Good analogy (though you really want `fc` instead of `lc` to perform a case-insensitive comparison).	[reply] [d/l] [select]
Re^4: Unicode vulgar fraction composition by tobyink (Canon) on Sep 28, 2020 at 16:07 UTC
For ASCII, `fc` does the same thing as `lc` though. And I specified ASCII for that reason. toby döt ink	[reply] [d/l] [select]
Re^5: Unicode vulgar fraction composition by ikegami (Patriarch) on Sep 30, 2020 at 06:15 UTC
I know. That doesn't change anything.	[reply]
Re^3: Unicode vulgar fraction composition by raygun (Scribe) on Oct 05, 2020 at 07:57 UTC
Sure, I think it's intuitive why `lc('Boaty McBoat')` is conceptually a "lossy" transformation (in terms of being able to restore the original string). But `NFKC("\N{VULGAR FRACTION THREE EIGHTHS}")` is conceptually "lossless": there is only one Unicode character the resultant string "3\N{FRACTION SLASH}8" could be "composed" into. As I wrote, I get now why NFKC is conceptually lossy in general. But—unlike with lc—some specific decompositions are exceptions.	[reply] [d/l] [select]
Re^4: Unicode vulgar fraction composition by soonix (Canon) on Oct 05, 2020 at 08:43 UTC
consider: `123\N{FRACTION SLASH}8` `12\N{VULGAR FRACTION THREE EIGHTHS}` I would read the former as "one hundred twenty three eights", but the latter as "twelve (plus) three eights", so it's not completely a one-to-one relationship.	[reply] [d/l] [select]
Re^5: Unicode vulgar fraction composition by raygun (Scribe) on Oct 05, 2020 at 19:02 UTC
Yes, my understanding is that's how Unicode would have you interpret each of those. So the problem then becomes that running `NFKC` on the latter produces the former: a nonequivalent string, therefore erroneous output. The correctly decomposed form of "12\N{VULGAR FRACTION THREE EIGHTHS}" would be, I presume, "12\N{ZERO WIDTH NON-JOINER}3\N{FRACTION SLASH}8". (Whether this is a bug or merely a "gotcha" in `NFKC` I suppose is a matter of interpretation.) But point taken that context matters when composing vulgar fractions.	[reply] [d/l] [select]
Re^4: Unicode vulgar fraction composition by ikegami (Patriarch) on Oct 06, 2020 at 04:14 UTC
There's no way to know that `3/8` means three-eights. For example, it could mean March 8th. As such there are two possible compositions for `3/8`: VULGAR FRACTION THREE EIGHTHS and `3/8`.	[reply] [d/l] [select]
Re^5: Unicode vulgar fraction composition by raygun (Scribe) on Oct 06, 2020 at 08:19 UTC
Absolutely true if (as you wrote) a U+002F SOLIDUS appears between the 3 and the 8. This is why I've been limiting my scope to the case where a U+2044 FRACTION SLASH appears between them, i.e., the specific sequence that `NFKC` or `NFKD` decomposes a Unicode vulgar fraction into.	[reply] [d/l] [select]
Re^6: Unicode vulgar fraction composition by ikegami (Patriarch) on Oct 06, 2020 at 20:17 UTC
Re^7: Unicode vulgar fraction composition by raygun (Scribe) on Oct 09, 2020 at 05:18 UTC
Some notes below your chosen depth have not been shown here


Think about Loose Coupling
	PerlMonks