Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re: Re: Finding dictionary words in a string.

by toma (Vicar)
on Mar 14, 2004 at 05:58 UTC ( [id://336461]=note: print w/replies, xml ) Need Help??


in reply to Re: Finding dictionary words in a string.
in thread Finding dictionary words in a string.

This is a great little program!

I couldn't resist speeding it up a bit. A commonly used trick in search engines is to shift everything to upper case.

Avoiding the i flag in the regular expression roughly doubles the speed, on my machine at least.

I also used Storable to create a slightly faster-loading dictionary.

#!/usr/bin/perl -w use strict; use Storable; my $search_word = uc shift or die "search word required\n"; my @words = (); my @dict; my $rdict; my $count = 0; if (not -e 'words.sto') { open WORDS, '/usr/share/dict/words' or die "can't open words file\n"; while (<WORDS>) { chomp; push @dict, uc $_; } close WORDS; $rdict= \@dict; store $rdict, 'words.sto'; } else { $rdict= retrieve('words.sto'); } for (@$rdict) { printf "%3d %s\n", ++$count, $_ if $search_word =~ /$_/; }
It should work perfectly the first time! - toma

Replies are listed 'Best First'.
Re: Re: Re: Finding dictionary words in a string.
by cchampion (Curate) on Mar 14, 2004 at 08:02 UTC

    I can't figure out how you got that speed improvement. The time you save by removing the /i flag is lost when you build a large array in memory (45,000 words in my machine).

    Besides, do you really think an all-uppercase output is better?

    $ time perl wsearch.pl owijfwapplelaskfiwejfcherryalkfwiofwfblossomowiejf
      1 Al
      2 apple
      3 as
      4 ask
      5 blossom
      6 cherry
      7 err
      8 he
      9 her
     10 Herr
     11 Io
     12 Los
     13 loss
     14 mow
     15 of
     16 so
     17 we
    0.30user 0.00system 0:00.30elapsed 99%CPU 
    
    $ time perl wsearch2.pl owijfwapplelaskfiwejfcherryalkfwiofwfblossomowiejf
      1 AL
      2 APPLE
      3 AS
      4 ASK
      5 BLOSSOM
      6 CHERRY
      7 ERR
      8 HE
      9 HER
     10 HERR
     11 IO
     12 LOS
     13 LOSS
     14 MOW
     15 OF
     16 SO
     17 WE
    0.48user 0.00system 0:00.48elapsed 99%CPU 
    

    Of course, I ran both scripts at least 4 times, to make sure that the storable file was created and that both scripts were reading their input from the disk cache.

      Not going to chime in with regards to the array in memory, but with regards to the output difference, just store the word in uppercase, as well. Also, don't forget to Code Smarter. Use index, instead of matching with a regex. That should provide the same functionality with nice speed gains.
      #!/usr/bin/perl use strict; use warnings; my $sw = lc shift or die "search word required\n"; my $count = 0; open WORDS, "/usr/share/dict/words" or die "can't open words file\n"; while (<WORDS>) { chomp; printf "%3d %s\n", ++$count, $_ if index($sw,lc $_) > -1; } close WORDS;
Re: Re: Re: Finding dictionary words in a string.
by toma (Vicar) on Mar 14, 2004 at 18:28 UTC
    Here are the timings for my machine. With this type of program, the results will depend on the relative performance of the hard disk, the CPU, the memory, and the caches.

    Even on a single machine, the results depend on the operating system kernel, and most importantly, with the build of perl. My vendor perl is an unusual build, and my own build is more tuned to my machine.

    The kernel I have been running, 2.4.22-6.ll.rh80, is an experimental low-latency kernel used for real-time applications. The file system is ext3.

    Timing, seconds Kernel 2.4.22-6.ll.rh80 Kernel 2.4.18-14
    Vendor perl, loop 2.938 2.569
    My perl, loop 1.957 1.677
    Vendor perl, Storable, no /i 1.267 0.966
    My perl, Storable, no /i 0.687 0.567

    If you prefer lower case to make the output look better, just use the lc function instead of the uc function.

    I think the lesson is that performance improvements are often difficult to generalize, so you have to try it on your own system and see what works.

    UPDATE: The Code Smarter suggestion is very good. For the Storable code, timings for the vendor perl on 2.4.22-6.ll.rh80 the timings are down to about 0.21 seconds and with my perl they are at about 0.13 seconds - too fast for my primitive benchmark.

    It should work perfectly the first time! - toma

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://336461]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (2)
As of 2024-04-19 01:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found