http://qs321.pair.com?node_id=1213034


in reply to Curious about Perl's strengths in 2018

Hi Crosis,

I'm going to write provocatively about a series of topics, one comment per topic. My thesis for each will be that Perl is strong in some particular area in which Python is weak.

This one's about text processing that involves text segmentation (i.e. character or substring processing) of Unicode text.

In a nutshell Perl is a world leader in getting this right. The Perl 5 community has trailblazed supporting devs in dealing with all the fiddly details in as practical a manner as it could manage given its existing runtime and standard library functions. Perl 6 has trailblazed developing a new runtime and standard library that makes it easy for mere mortals to get the right results without having to have a degree in Emoji data science.

In the meantime, the Python language, string type, standard library, and doc all entirely ignore the pieces necessary for getting text segmentation right per Unicode annex #29 (linked above) so it is all but impossible for any ordinary dev to correctly segment arbitrary Unicode text in Python 3.7.

Feel free to ask what the heck I'm talking about if it's not obvious from what I've written and the link I provided.

If you follow up on this comment I'll post another topic so we can keep things rolling. And if you comment on that, I'll post on another topic. I think I've got maybe 10 if you've got the stamina...

Hi monks, hope you're all doing well.