Re: minimal superstrings/maximal substrings

If I read this correctly, you do not want (SOCIOBIOLOGIST, OLOGIST) but do want (SOCIOBIOLOGIST, BIOLOGIST) and (BIOLOGIST, OLOGIST).

This leads to a simple iterative solution, where you first find all L > S tuples with the longest L and shortest S, then "split" those tuples for cases where some W exists such that L > W > S until you have no more tuples to split. Each split replaces one entry L > S in the list with a pair of entries L > W and W > S.

I am unsure, but suspect that you may be able to get better performance by applying scalar reverse to each string and working with "mirrored" strings, turning suffix matches (hard) into prefix matches (easy).

Now that I think about it, sorting those "mirrored" strings should put all possible L > S tuples into neat contiguous groups, with the shortest S at the beginning, longest L at the end, and any W in the middle, if you use the default cmp for the sort.

Does the entire lexicon fit into memory or do you need to sort them on disk?

Edit to add: At least this stage is simple enough that I would suggest solving this with the shell sort command and awk if you have those tools. I firmly believe that every monk here should learn AWK and use it for these types of simple hobby tasks — doing so will make you a better Perl programmer. (The GNU Awk manual is positively "light" reading compared to the enormous library of POD that comes with Perl.)

Comment on Re: minimal superstrings/maximal substrings Select or Download Code

Replies are listed 'Best First'.

Re^2: minimal superstrings/maximal substrings
by Laurent_R (Canon) on Aug 20, 2019 at 08:42 UTC

Does the entire lexicon fit into memory or do you need to sort them on disk?

On your edit: yes, I agree that AWK is a great tool that I have used quite a lot in the past, but, personally, I have almost stopped using AWK when I started to know Perl well enough to make it a more powerful and more flexible replacement.

[reply]


Your skill will accomplish what the force of many cannot
	PerlMonks