Re: Finding all substrings

This solution depends on two factors, how many times does the code run and how much do the length of the strings vary? If the answers are "a lot" and "not much" then here's a faster solution:

{
  my @cache; # a cache of substring-finding subs
  sub substrings {
    my $string = shift;
    my $length = length $string;

    # use cached sub if we have one
    return $cache[$length]->($string)
      if exists $cache[$length];

    # create sub to find substrings for this length
    my $sub = 'sub { $_ = shift; return (';
    foreach my $length (1..length($string)) {
      foreach my $offset (0..length($string)-$length) {
        $sub .= "substr(\$_,$offset,$length),";
      }
    }
    $sub .= ")};";
    $cache[$length] = eval $sub;

    # and use it
    return $cache[$length]->($string);
  }
}
[download]

Using Benchmark.pm and a test case against random words from /usr/dict/words I found this to be 300% faster over 100,000 iterations. Not bad, but I bet we could do better!

-sam

PS: Has anyone noticed that <code> doesn't deal with hard tabs right? Copy-and-paste from Emacs is unpleasant.

Comment on Re: Finding all substrings Download Code

Replies are listed 'Best First'.
Re: Re: Finding all substrings by samtregar (Abbot) on Apr 24, 2002 at 19:13 UTC
Slight improvement, now around 400% faster, at the expense of readability. I got rid of the parameter passing to the cached subs by using $_. Also, I unrolled the last iteration replacing a substr($string,0,length($string)) with just $string. I tried unrolling the first iteration into a split() but that was slower. `{ my @cache; sub substrings { $_ = shift; return &{$cache[length($_)]} if exists $cache[length($_)]; my $sub = 'sub { return ('; foreach my $len (1..length($_)-1) { foreach my $off (0..length($_)-$len) { $sub .= "substr(\$_,$off,$len),"; } } $sub .= "\$_)};"; $cache[length($_)] = eval $sub; return &{$cache[length($_)]}; } }` [download] The only trick I have left is Inline::C... I'll leave that as an exercise for the reader. -sam	[reply] [d/l]
Re: Finding all substrings by Dominus (Parson) on Apr 24, 2002 at 19:34 UTC
Says samtregar: I found this to be 300% faster over 100,000 iterations. Not bad, but I bet we could do better! Supposing that the original version took 10 seconds to run, then yours, 300% faster, completes and delivers the answer 20 seconds before you even enter the command that runs it. That is an impressive achievement indeed. I admire your optimization skills. Perhaps you meant that yours was three times as fast? If so, it was 66.67% faster, not 300%. Hope this helps. -- Mark Dominus Perl Paraphernalia	[reply]
Re: Re: Finding all substrings by samtregar (Abbot) on Apr 24, 2002 at 19:39 UTC
Um, just reading the Benchmark output man: `Rate original new original 7710/s -- -80% new 38610/s 401% --` [download] If that doesn't mean 400% faster then I guess I need more education. -sam	[reply] [d/l]
Re: (3) Finding all substrings (Russ: more != less) by Russ (Deacon) on Apr 24, 2002 at 19:49 UTC
Benchmark is telling you: new is 80% faster than original original is 401% slower than new Don't confuse "this much more" with "this much less." :-) Russ Brainbench 'Most Valuable Professional' for Perl	[reply]
(MeowChow - you're all confused) Re5: Finding all substrings by MeowChow (Vicar) on Apr 25, 2002 at 00:20 UTC
Re: Re: Finding all substrings by BUU (Prior) on Apr 25, 2002 at 02:02 UTC
If it took 10 seconds, and was 300% faster, that would merely mean it was 3 seconds long. I think. Actually i think theyre both interchangeable	[reply]


We don't bite newbies here... much
	PerlMonks