Try this. It uses ~300MB to store and index 1 million (randomly generated) words (14MB on disk). It allows full regex searching (with some adaptions) and never seems to take more than 2 seconds:
#! perl -slw
use strict;
use Time::HiRes qw[ time ];
use List::MoreUtils qw[ uniq ];
sub idx{
my $idx = chr(0) x 4;
vec( $idx, $_, 1 ) = 1 for map ord()-ord('a'), uniq sort split'',
+$_[ 0 ];
$idx;
}
my %idx;
open DICT, '<', 'junk.words' or die $!;
while( <DICT> ) {
chomp;
push @{ $idx{ idx( $_ ) } }, $_;
}
my @keys = keys %idx;
print scalar @keys;
while( <> ) {
my $start = time;
my @matches;
my $n = 0;
chomp;
( my $pat = $_ ) =~ tr[a-z][]cd;
$pat = idx( $pat );
for my $idx ( grep+(($_ & $pat) eq $pat), @keys ) {
for my $poss ( @{ $idx{ $idx } } ) {
$poss =~ $_ and $matches[ $n++ ] = $poss;
}
}
printf "Found $n matches in %.2f seconds; Display? ", time() - $s
+tart;
if( <> =~ /y/i ) {
print for @matches;
}
}
A few examples:
c:\test>742277.pl
857720
z$
Found 38464 matches in 1.86 seconds; Display? n
zz$
Found 1481 matches in 1.80 seconds; Display? n
zzz$
Found 55 matches in 1.78 seconds; Display? n
^a.*zzz$
Found 3 matches in 1.00 seconds; Display? y
afyjhcukywpbzzz
azhmwxjxncbaozzz
atzzz
[aeiou]{6]
Found 0 matches in 0.39 seconds; Display? n
[aeiou]{6}
Found 99 matches in 0.43 seconds; Display? n
[aeiou]{7}
Found 23 matches in 0.45 seconds; Display? n
[aeiou]{8}
Found 2 matches in 0.43 seconds; Display? y
ouaueeie
acxftoeeoeuiofoj
^[aeiou]+$
Found 3 matches in 0.44 seconds; Display? y
ieouea
iuaoea
ouaueeie
^for
Found 54 matches in 0.67 seconds; Display? n
^for.*ness
Found 0 matches in 0.39 seconds; Display?
^for.*n
Found 14 matches in 0.50 seconds; Display? y
foromnfikqfarwgedn
fornsdlluobiqdmacjl
forhkzmalfewhaohknrl
fornfxdprljcckkh
fortgntqbpbnmmtpk
forkqvimulibcfxwyjnce
formslskcoazusn
fornywxhqt
forndbzjfm
forfnmhhvdcntt
forxhbcimsggnhhmbiqze
foruhvpekgtnialyifyi
forcnmamdsx
forcvxnb
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.