Perl and Morphology

justinNEE has asked for the wisdom of the Perl Monks concerning the following question:

I'm interested in getting ideas on how to go about writing a program to take two lists of words and try to match morphemes. One list would be in English, the other list would be in a langauge that is known at runtime. The bound morphemes would be predictable(plural, tense, aspect...) but the number of "roots" would not be known until the program has gone through the lists. For example:

Data:
baSlar,heads
BaSlarimiz,our heads
baSimda,in my head
[download]

Would return something like:

baS,head
-lar,inflectional:plural
-imiz,our -
-imda,in my -
[download]

(or instead of 'in my - ' it would return a description.) These observations may not be true for the language, but they are true for the data that we have. When rules contradict eachother the program might look at the data closer to see if the rule is more complex, or it might decide that since the occurance of the rule is once out of x times, it is an exception, or that since two rules occur 50% each, they are both acceptable. The word lists would generally be around 100-200 entries... I'll try to get a bigger sample to play with tomorrow. I read the article in tpj #17 and while it was interesting, I still don't know where to start...

Back to Seekers of Perl Wisdom