Did I understand points 1) and 2) correctly? This script finishes a list of almost 640,000 entries in less then a minute (50sec), adding conditions 3) and 4) should be really easy.
#!/usr/bin/perl
use feature 'say';
use warnings;
use strict;
my $file = '/etc/dictionaries-common/words';
open my $IN, '<', $file or die "$!";
my %words;
while (my $word = <$IN>) {
chomp $word;
undef $words{$word};
}
for my $word (keys %words) {
my $length = length $word;
my %found; # report each occurence just once
for my $pos (0 .. $length - 1) {
my $skip_itself = ! $pos;
for my $len (1 .. $length - $pos - $skip_itself) {
my $subword = substr($word, $pos, $len);
next if exists $found{$subword};
if (exists $words{$subword}) {
say "$subword in $word";
undef $found{$subword};
}
}
}
}
Update: if I ommit the "just once" condition, the script finishes in 40 secs on my Mac Mini.