Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Pronouncible TLA's?

by BrowserUk (Patriarch)
on Apr 28, 2016 at 21:36 UTC ( [id://1161811]=perlquestion: print w/replies, xml ) Need Help??

BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

Is there a metric by which the pronounceability of of a combination of letters can be judged? And is there anything that Perl can do to measure it?

Pronounceable by whom is a variable, xkz might be pronounceable in Czech or Polish; but since I am chronically monoligual; let's stick to 'by native English speakers'.

That still leaves plenty of scope for sounds made by the native English speakers from some parts of the world that are totally unreproducible by those from others; but its a place to start.

To be useful, the pronunciations would need to be distinct; and changing the pronunciation of known word, to enable a less common spelling to (re)use that sound defeats the purpose.

Ie. Deciding to pronounce 'are':ah-rey, so that 'aar' can be pronounced as ah doesn't work.

Also, I think that the Dutch could intone a difference between 'six' and 'syx' and hear the difference; but I'm pretty sure I can't do either.

For the 'Perl content' and 'what have you got' criteria, this produces all 17,576 3-letter combinations. Can you add code to eliminate the unpronounceable?:

use Algorithm::Combinatorics qw[ variations_with_repetition ];; $i = variations_with_repetition( [ 'a'..'z' ], 3 );; $" = ''; print "@$_" while $_ = $i->next;; aaa aab aac aad ...

I thought perhaps that any combination that contained a vowel or y might work:

$" = ''; $_ = join( '', @$_ ), m[[aeiouy]] and print while $_ = $i->ne +xt;;

But I don't think any of these would qualify: xxa xxe xxi xxo xxu xxy xza xze xzi xzo xzu xzy.

And there is a distinguish-ability problem with things like gga gja jga & jja. Again the Dutch might get something like kh-yah & ye-gah for the middle two, but I probably wouldn't be able to pronounce either to their satisfaction.

The application, albeit a light-hearted speculation rather than yet a serious pursuit, is the Huffman encoded (I really just mean 'short' here) naming of a few (3-7?) thousand items (variables; values) in a phonetically pronounceable way, for clarity of verbal communication, and memorability.

And why not just use numbers? The hope is that at least some of the more common items could be named in some vaguely mnemonic way; to aid in the second of the above goals in a way that "Item 237' never will.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re: Pronouncible TLA's?
by Marshall (Canon) on Apr 28, 2016 at 22:27 UTC
    I'm not sure that you can come up with a program that can do what the human brain can do. The number of combinations (17K+) is not all that big.

    If you developed an application that allowed native english speakers to "vote" with suggested psuedo pronounications, then I'm guess that 50 volunteers each making 500-1,000 decisions could work out? Some TLA's would need to be reviewed by more than one person.

    I've written some "voting code" before with complex decisions that my algorithms just couldn't do. About 200 decisions per person wound up being "doable". Instead of improving the algorithms, I spent the time on a fancy GUI to allow the humans to weigh in on the 0.5% of really hard problems. My situation was different than this, but some similarities exist.

    Part of the issue here is that TLA's are application specific. Example, TRS-80, The Radio Shack model 80. TRS80 became known as "Trash80". A human could come with "trash" for TRS, but I don't see how a computer program could do that? This prounaction also had to with rep from manufacturer, a very situation specific. I'm not sure that any program could do what you are attempting. But it doesn't appear that this needs to be "re-calculated" easily in a completely automated process. My suggestion is to "divide and conquer" with a small army of humans guided by a good application program.

      My suggestion is to "divide and conquer" with a small army of humans guided by a good application program.

      What a cool idea. Maybe stumping up for Amazon's Mechanical Turk (or whatever it is called) is the cheapest solution.

      (Might need clear eligibility rules for the native language/country of origin to ensure some consistency; but that's probably not PC.)


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
      In the absence of evidence, opinion is indistinguishable from prejudice.
        That is an interesting thought. I hadn't heard of Mechanical Turk before.

        I suspect what will happen is that some folks will emerge who are both a) really good at it and b) enjoy it because it is kind of like a puzzle, and some of those folks will wind up spending a lot of time on it.

        Of course there will be some folks who are really bad at the task, but will want to participate nevertheless.

        I'm not sure that everybody should be "equal". Some kind of review committee could be setup with an invited group of the gurus (first category above). You could view the main crowd as a big idea generation machine. I think reviewing and rejecting bad ideas is an easier task that coming up with the idea in the first place.

        I'm not sure that having alternate pronunciations is a problem? If I see LCK, I immediately come to "lock", but someone else might say oh, that is "luck". In a dictionary, there are often multiple meanings and definitions. I presume that the same thing will happen here.

        An interesting problem.

Re: Pronouncible TLA's?
by kcott (Archbishop) on Apr 29, 2016 at 01:46 UTC

    G'day BrowserUk,

    Another consideration is context. For example, gnu, the antelope, has a silent 'G' while GNU, the software, has a pronounced 'G'. In this instance, capitalisation might be used to differentiate the two but, I suspect, that could be a rather flakey approach.

    There's also a whole a range of TLAs which many pronounce as either letters or words, e.g. SQL/sequel, Tcl/tickle, PDL/piddle. Regardless of whether either, or both, of these forms are correct (for these, or any other, examples), an algorithm to determine whether vowel sounds (and which ones) can be inserted to create a pronouncible TLA is probably not possible.

    — Ken

Re: Pronouncible TLA's?
by RichardK (Parson) on Apr 29, 2016 at 06:46 UTC

    Perhaps something like soundex encoding will let you find the similar sounding cases. I'm not sure that soundex itself will work properly on such short words, but it may be a useful starting point.

    soundex retains the first letter so kat & cat will encode differently, and that won't give you what you want directly. But if you encode the first letter as well, then cat & kat would both give 23.

Re: Pronouncible TLA's?
by Anonymous Monk on Apr 28, 2016 at 22:08 UTC

    How about pulling all three-consecutive-letter sets from /use/share/dict/words that have a vowel in them.

      It's a start but only a subset. "gux" is perfectly pronouncible and uniquely so but doesn't appear in my version of /usr/share/dict/words.

        Another subset might be, I suspect, all versions of <vowel><consonant><vowel> and <consonant><vowel><consonant>

Re: Pronouncible TLA's?
by dsheroh (Monsignor) on Apr 29, 2016 at 10:40 UTC
    Even if you restrict it to only native English speakers, you're going to hit plenty of cases where there won't be a consensus on how it's pronounced, or even if it's pronounceable. For example, there was recently yet another flare-up in the intermittent war over the pronunciation of "GIF".

      That ol'bone, There's no J. It is so obviously gift without the T :)


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Pronouncible TLA's?
by Marshall (Canon) on Apr 30, 2016 at 17:08 UTC
    Your post brought up some old memories.
    I remember one guy who I met early in my career. He was a black-belt ASM guy. (Oh, see now we have a TLA that most CS folks would be able to pronounce....) Any way we had one machine that only allowed 5 capital letters max for the variable names. This guy had an amazing ability to come up with names that were very understandable in the context of the program. On a scale of 1..10, this guy was 15 (off the chart) and the average guy, 5. The difference was that big.

    I was both stunned and inspired by this guy's coding ability. The code was amazing in its clarity of thought and the "names" worked. The point being that writing "good" acronyms is an art form. Some folks are way, way better at this than others. One device driver I worked with only had 2 comments! "Suck it in" and "Blow it out". But yet I could read the program and understand it. Very, very, very unusual for ASM code.

    Instead of eliminating "unpronounceable" words, I would suggest starting with known pronounceable words. Perhaps it is possible to come up with some kind of "certainty scale"?

    Any TLA that appears in the Oxford Dictionary is "pronounceable", bat, cat, rat, ate, tea, tee. etc. Word endings like "-ule" also qualify, like in "capsule", the "ule is pronounceable. That is 100% on the certainty scale. Note: some of these things might wind up being archaic and not really 100% by all native English speakers.

    Some of these TLA's are pronounceable with insertion of a vowel, they become a word in the dictionary, LCK, "lock", "luck". These would have a lower score.

    But I don't think any of these would qualify: xxa xxe xxi xxo xxu xxy xza xze xzi xzo xzu xzy.
    "X" is often pronounced like "Z", "xxu", perhaps "zu" or "zoo". Ala: Xlinix, Xerox.

    The humans probably would be needed to find things like ASM, "Assembly" although that might actually be in the Oxford Dictionary?

    If you do go with my "crowd source" idea, throw some "easy ones" into each work unit. This is both a check on whether the human is paying attention and also a "positive re-enforcement" that the task is possible.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1161811]
Approved by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (3)
As of 2024-04-19 22:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found