Your skill will accomplish what the force of many cannot |
|
PerlMonks |
Re: minimal superstrings/maximal substringsby jcb (Parson) |
on Aug 20, 2019 at 01:25 UTC ( [id://11104714]=note: print w/replies, xml ) | Need Help?? |
If I read this correctly, you do not want (SOCIOBIOLOGIST, OLOGIST) but do want (SOCIOBIOLOGIST, BIOLOGIST) and (BIOLOGIST, OLOGIST). This leads to a simple iterative solution, where you first find all L > S tuples with the longest L and shortest S, then "split" those tuples for cases where some W exists such that L > W > S until you have no more tuples to split. Each split replaces one entry L > S in the list with a pair of entries L > W and W > S. I am unsure, but suspect that you may be able to get better performance by applying scalar reverse to each string and working with "mirrored" strings, turning suffix matches (hard) into prefix matches (easy). Now that I think about it, sorting those "mirrored" strings should put all possible L > S tuples into neat contiguous groups, with the shortest S at the beginning, longest L at the end, and any W in the middle, if you use the default cmp for the sort. Does the entire lexicon fit into memory or do you need to sort them on disk? Edit to add: At least this stage is simple enough that I would suggest solving this with the shell sort command and awk if you have those tools. I firmly believe that every monk here should learn AWK and use it for these types of simple hobby tasks — doing so will make you a better Perl programmer. (The GNU Awk manual is positively "light" reading compared to the enormous library of POD that comes with Perl.)
In Section
Seekers of Perl Wisdom
|
|