Unicode vulgar fraction composition

raygun has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Unicode vulgar fraction composition by kcott (Archbishop) on Sep 24, 2020 at 08:22 UTC
G'day raygun, As ++ikegami has explained, and you have accepted, there is no compatibility composition. You have asked about modules. There are some available but I can't say whether they are suitable for your purposes (as you haven't explained that part). Here's a couple. If these aren't suitable, search MetaCPAN using terms reflecting your use case. HTML::Fraction may do what you want if you're working with HTML. Unicode::Fraction will render any vulgar fraction into something that's intended to look like a Unicode fraction: 12345/67890 becomes something like ¹²³⁴⁵/₆₇₈₉₀ (that's just a rough approximation). "... write my own function to handle these fractions (there are only a dozen or so), ..." There's actually 18 in total. Three have the codepoints U+00BC - U+00BE and can be found in the PDF Code Chart "C1 Controls and Latin-1 Supplement". The other 15 have the codepoints U+2150 - U+215E and can be found in the PDF Code Chart "Number Forms". Writing your own function is pretty easy. I wrote one just for the fun of it: I've put it in a spoiler so as not to spoil your fun if you wanted to do this, but do feel free to look and take any code or ideas you want. <Reveal this spoiler or all in this thread> — Ken	[reply] [d/l]
Re^2: Unicode vulgar fraction composition by raygun (Scribe) on Sep 24, 2020 at 17:28 UTC
Thanks much, ikegami and Ken, for the additional explanations and ideas. Super helpful! Expanding on ikegami's explanation of why a "compatibility composition" might be ambiguous: I also see that, for instance, U+2168 ROMAN NUMERAL NINE has a compatibility decomposition into the capital letters "I" and "X," but even if no other Unicode character has that particular decomposition, that certainly doesn't mean that any "I" followed by an "X" represents the roman numeral and should thus be "compatibility composed" into it. So yes, I see now why the concept is fraught with peril—in general. But a string like "3\N{FRACTION SLASH}8" seems to have an unambiguous meaning that is always equivalent to U+215C VULGAR FRACTION THREE EIGHTHS. So it seems a `compatibility_compose_where_it_makes_sense()` function could be written. But it would require judgment calls for every possible "compatibility composition," potentially not all of which would be clear-cut, so I can see why no one's rushing to implement it.	[reply] [d/l]
Re: Unicode vulgar fraction composition by ikegami (Patriarch) on Sep 24, 2020 at 02:14 UTC
The "C" and "D" transformations are inverse of each other, but there's no inverse to "K". It's a destructive transformation. For example, both the ANGSTROM SIGN (Å) and the LATIN CAPITAL LETTER A WITH RING ABOVE (Å) are independent symbols with distinct meanings, but they have the same KC form and KD form. There's no way to know how to reverse the transformation to restore the original meaning.	[reply]
Re^2: Unicode vulgar fraction composition by tobyink (Canon) on Sep 26, 2020 at 09:55 UTC
One way of thinking about it, in a simplified ASCII world, would be if you lowercased words to do a case comparison: `chomp( my $name = lc <$fh> ); if ( $name eq 'bob jones' ) { die 'rejecting annoying person'; } # Now I want to restore $name to its original mixture of upper and l +ower case` [download] toby döt ink	[reply] [d/l]
Re^3: Unicode vulgar fraction composition by ikegami (Patriarch) on Sep 28, 2020 at 02:16 UTC
Good analogy (though you really want `fc` instead of `lc` to perform a case-insensitive comparison).	[reply] [d/l] [select]
Re^4: Unicode vulgar fraction composition by tobyink (Canon) on Sep 28, 2020 at 16:07 UTC
Re^5: Unicode vulgar fraction composition by ikegami (Patriarch) on Sep 30, 2020 at 06:15 UTC
Re^3: Unicode vulgar fraction composition by raygun (Scribe) on Oct 05, 2020 at 07:57 UTC
Sure, I think it's intuitive why `lc('Boaty McBoat')` is conceptually a "lossy" transformation (in terms of being able to restore the original string). But `NFKC("\N{VULGAR FRACTION THREE EIGHTHS}")` is conceptually "lossless": there is only one Unicode character the resultant string "3\N{FRACTION SLASH}8" could be "composed" into. As I wrote, I get now why NFKC is conceptually lossy in general. But—unlike with lc—some specific decompositions are exceptions.	[reply] [d/l] [select]
Re^4: Unicode vulgar fraction composition by soonix (Canon) on Oct 05, 2020 at 08:43 UTC
Re^5: Unicode vulgar fraction composition by raygun (Scribe) on Oct 05, 2020 at 19:02 UTC
Re^4: Unicode vulgar fraction composition by ikegami (Patriarch) on Oct 06, 2020 at 04:14 UTC
Re^5: Unicode vulgar fraction composition by raygun (Scribe) on Oct 06, 2020 at 08:19 UTC
Some notes below your chosen depth have not been shown here
Re: Unicode vulgar fraction composition by Anonymous Monk on Sep 23, 2020 at 12:34 UTC
I think you'll have to DIY. The module is implementing unicode.org algorithm, see references in docs. In the context of Unicode, character composition is the process of replacing the code points of a base letter followed by one or more combining characters into a single precomposed character; and character decomposition is the opposite process.	[reply]
Re^2: Unicode vulgar fraction composition by raygun (Scribe) on Sep 23, 2020 at 15:37 UTC
Right, which is why, on the surface, it's curious that both `NFKD` and `NFKC` return decomposed forms. Some illumination is provided in the documentation: NFKD performs "compatibility decomposition," while NFKC performs "compatibility decomposition followed by canonical composition." So apparently what I want is "compatibility composition," which it seems nothing in that module performs. Thus my question amounts to: does anything else in Perl do compatibility composition?	[reply] [d/l] [select]
Re^2: Unicode vulgar fraction composition by LanX (Saint) on Sep 23, 2020 at 14:20 UTC
> I think ... And I think you are just restating the obvious from the OP. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]


Your skill will accomplish what the force of many cannot
	PerlMonks