improving speed in ngrams algorithm

by johngg
 on Jun 12, 2019

in reply to improving speed in ngrams algorithm

A solution using split, array slices and shift. No idea if it is fast or slow as I haven't run any benchmarks.

```use 5.026;
use warnings;

my \$text = q{this is the text to play with};

for ( 1 .. 8 )
{
say qq{\$_-word ngrams of '\$text'};
say for nGramWords( \$_, \$text );
say q{-} x 20;
}

sub nGramWords
{
my( \$nWords, \$string ) = @_;

my @words = split m{\s+}, \$string;
my \$start = 0;
my @nGrams;

while ( scalar @words >= \$nWords )
{
push @nGrams, join q{ },
qq{START INDEX: @{ [ \$start ++ ] } : },
@words[ 0 .. \$nWords - 1 ];
shift @words;
}

return @nGrams;
}

The output.

```1-word ngrams of 'this is the text to play with'
START INDEX: 0 :  this
START INDEX: 1 :  is
START INDEX: 2 :  the
START INDEX: 3 :  text
START INDEX: 4 :  to
START INDEX: 5 :  play
START INDEX: 6 :  with
--------------------
2-word ngrams of 'this is the text to play with'
START INDEX: 0 :  this is
START INDEX: 1 :  is the
START INDEX: 2 :  the text
START INDEX: 3 :  text to
START INDEX: 4 :  to play
START INDEX: 5 :  play with
--------------------
3-word ngrams of 'this is the text to play with'
START INDEX: 0 :  this is the
START INDEX: 1 :  is the text
START INDEX: 2 :  the text to
START INDEX: 3 :  text to play
START INDEX: 4 :  to play with
--------------------
4-word ngrams of 'this is the text to play with'
START INDEX: 0 :  this is the text
START INDEX: 1 :  is the text to
START INDEX: 2 :  the text to play
START INDEX: 3 :  text to play with
--------------------
5-word ngrams of 'this is the text to play with'
START INDEX: 0 :  this is the text to
START INDEX: 1 :  is the text to play
START INDEX: 2 :  the text to play with
--------------------
6-word ngrams of 'this is the text to play with'
START INDEX: 0 :  this is the text to play
START INDEX: 1 :  is the text to play with
--------------------
7-word ngrams of 'this is the text to play with'
START INDEX: 0 :  this is the text to play with
--------------------
8-word ngrams of 'this is the text to play with'
--------------------

I hope this is of interest.

Cheers,

JohnGG

