Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Re: Re: Perl's pearls

by petral (Curate)
on Jan 02, 2002 at 20:17 UTC ( [id://135696]=note: print w/replies, xml ) Need Help??


in reply to Re: Re: Perl's pearls
in thread Perl's pearls

It seems like the main improvement/optimization would be not looping twice through the list of all words.  Move *all* processing into the main loop:
my (%word, %gram); while (<>) { chomp; # $_ = lc $_; /[^a-z]/ and next; my $sig = pack "C*", sort unpack "C*", $_; if (exists $word{$sig}) { if (exists $gram{$sig}) { next if $gram{$sig} =~ /\b$_\b/; $gram{$sig} .= " $_"; # rare } else { next if $word{$sig} eq $_; $gram{$sig} = "$word{$sig} $_"; # rare } } else { $word{$sig} = $_; # mostly } } print join "\n", (sort values %gram), ''; # just output short list

Only the first word of an anagram set is in both lists.
Here's some more finds, mostly from the short OED from here
ablest bleats stable tables adroitly dilatory idolatry angered derange enraged grandee grenade ascertain cartesian sectarian asleep elapse please aspirant partisan attentive tentative auctioned cautioned education canoe ocean comedian demoniac compile polemic covert vector danger gander garden deist diets edits idest sited tides emits items metis mites smite times emitter termite lapse leaps pales peals pleas nastily saintly obscurantist subtractions observe obverse verbose opt pot top opts post pots spot stop tops opus soup oy yo petrography typographer peripatetic precipitate present repents serpent presume supreme resin rinse risen siren salivated validates slitting stilting tiltings titlings tlingits views wives vowels wolves woodlark workload


  p

Replies are listed 'Best First'.
Re: Re: Re: Re: Perl's pearls
by gmax (Abbot) on Jan 02, 2002 at 20:55 UTC
    Brilliant! On my computer, your script is 13% faster than mine, using my 100_000 words list. With the one that you suggested (thanks, BTW) which is more than double, the gain is 23%!
    It means that yous solution is more scalable and thus better suitable for this kind of tasks.
    Like every "eureka" solution, your improvement looks quite simple, now that I see it! :-)
    Thanks.
     _  _ _  _  
    (_|| | |(_|><
     _|   
    

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://135696]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (3)
As of 2024-04-25 22:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found