### extracting all possible n-mers from an array

by gingaloon (Acolyte)
 on Jun 06, 2006 at 23:26 UTC Need Help??

gingaloon has asked for the wisdom of the Perl Monks concerning the following question:

Dear All,

```@letters=qw/A B C D E F/;
my \$len=4;

I would like to pull back all possible combinations of size 4 from this array. The extra bit is to allow resampling.

```So:
AAAA
AAAB
AAAC
----
CCFC
----
FFFF
```

I've come up with a couple of various ugly solutions which are hacks of other Q&As. But they're very slow (my hacks that is).

Could someone provide me with an elegant (or quick) solution. I've done searches but couldn't find this question. Please point me in the direction if it has.

thanks

james

Replies are listed 'Best First'.
Re: extracting all possible n-mers from an array
by runrig (Abbot) on Jun 06, 2006 at 23:41 UTC
Algorithm-Combinatorics was exactly what I was looking for.

thanks runrig

james

Re: extracting all possible n-mers from an array
by jwkrahn (Abbot) on Jun 06, 2006 at 23:49 UTC
```
print "\$_\n" for glob "{@{[ join ',', @letters ]}}" x \$len;
thanks,

could you take a few seconds and explain what is going on here. I've not seen glob in this context before. All those references are also too nice not to use again.

-james

Let me try my hand at it...

Original code:

```print "\$_\n" for glob "{@{[ join ',', @letters ]}}" x \$len;
From the inside out:
```join ',', @letters
creates a comma separated string: A,B,C,D,E,F.

[...] creates an anonymous array of 1 element, which @{[...]} then dereferences. Putting that inside double quotes interpolates that 1 element array into the double-quoted string, surrounded by the outside pair of {}, yielding {A,B,C,D,E,F}.

x \$len repeats the string, in this case 4 times, to give:

```{A,B,C,D,E,F}{A,B,C,D,E,F}{A,B,C,D,E,F}{A,B,C,D,E,F}
glob treats each {...} as a set, and generates all strings with one element from each set.

I've always felt like this was almost undocumented, because the glob entry points to 2 other entries, one of which is completely unhelpful in this case, and the other which backs into it as an afterthought. If you're not paying attention, or don't already know it from some other context, glob doesn't improve the situation.

The reason for the "{@{[...]}}" magic is to take the string from join and sandwich it between the curlies. That's a lot of work, and not always easy to follow. I prefer the following, just because it's easier to explain:

```print "\$_\n" for glob (('{'. join( ',', @letters).'}') x \$len);
I sometimes see code a bit convoluted because someone forgot that join can also take parens to disambiguate it's arguments from the rest of the expression. In this case, glob uses parens for precedence with x \$len, though the outer set of parens could be replaced with a leading +, as in:
```print "\$_\n" for glob +('{'. join( ',', @letters).'}') x \$len;
As has been noted before (see this subthread), glob isn't necessarily good for large list generation.

-QM
--
Quantum Mechanics: The dreams stuff is made of

for some reason i immediately thought of colour pallettes.
not the nicest code, but something i knocked up quickly using your algorithm:
```my \$c;
my \$cols={map{(chr(65+\$c++),sprintf "%02x",\$_)}(0, 51, 102, 153, 204,
+ 255)};
my @letters=qw/A B C D E F/;

print "<html><body><table border=1><tr>";
\$c=0;
for(glob "{@{[ join ',', @letters ]}}" x 3){
s#.#\$cols->{\$&}#eg;

print "<td bgcolor='#\$_'>&nbsp;</td>\n";
print "</tr><tr>" if !(++\$c %9);
};
print "</tr></table></body></html>";
Re: extracting all possible n-mers from an array
by ikegami (Patriarch) on Jun 07, 2006 at 01:17 UTC
Another solution, this one relying on base conversion (base 10 to base 6):
```my @letters = qw( A B C D E F );

for (0..6**4-1) {
use integer;
my \$num = join '', map \$letters[\$_], \$_/6/6/6 % 6,
\$_/6/6   % 6,
\$_/6     % 6,
\$_       % 6;
print("\$num\n");
}

If @letters and \$len are variable, try this generalization:

```my @letters = qw( A B C D E F );
my \$len = 4;

sub convert {
my (\$num) = @_;

use integer;

my \$base = @letters;
my \$rv = '';

for (1..\$len) {
\$rv = \$letters[\$num % \$base] . \$rv;
\$num /= \$base;
}

return \$rv;
}

print(convert(\$_), "\n")
for 0..@letters**\$len-1;
Re: extracting all possible n-mers from an array
by polettix (Vicar) on Jun 06, 2006 at 23:56 UTC
There's a nice example in Algorithm::Loops that may apply, here's a possible usage:
```#!/usr/bin/perl
use strict;
use warnings;
use Algorithm::Loops qw( NestedLoops );

my @list = NestedLoops(
[ ( [ 'A' .. 'F' ] ) x 4 ],
sub { return join '', @_ },
);
print "\$_\n" foreach @list;
The funny side is that I don't quite understand this code ;)

Flavio
perl -ple'\$_=reverse' <<<ti.xittelop@oivalf

Don't fool yourself.
Re: extracting all possible n-mers from an array
by Anonymous Monk on Jun 08, 2006 at 23:07 UTC
That there is a job for either Math::Combinatorics or Algorithm::Combinatorics from CPAN. It seems the latter is more effecient but I suggest a quick benchmark on both.

Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://553930]
Approved by chargrill
Front-paged by Old_Gray_Bear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (6)
As of 2024-04-23 07:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?

No recent polls found