 Perl Monk, Perl Meditation PerlMonks

### Puzzle Regex: Letter Frequency Arithmetic Sequence

by QM (Parson)
 on Oct 17, 2017 at 15:03 UTC Need Help??

QM has asked for the wisdom of the Perl Monks concerning the following question:

I ran across a blog post about an interesting word characteristic, and wondered if a regex can be written to match this (e.g., with the plan to search a dictionary file). I suspect the answer is no, without invoking the magic "code in a regex".

Unfortunately, I don't have time now to try my hand at it, but I thought I'd post it here for everyone to have a go.

Update: Possible puzzles:

1) Find the longest words where each letter used has a different frequency.
2) Find the longest words where letter frequencies are sequential (e.g., 3,4,5,6).
3) Find the longest words where letter frequencies are sequential starting from 1.

Use any dictionary you like. If you have a mahvelous dictionary, drop a link here.

-QM
--
Quantum Mechanics: The dreams stuff is made of

• Comment on Puzzle Regex: Letter Frequency Arithmetic Sequence

Replies are listed 'Best First'.
Re: Puzzle Regex: Letter Frequency Arithmetic Sequence
by choroba (Archbishop) on Oct 17, 2017 at 15:31 UTC
Not sure about regexes, but counting frequencies means hashes.
```#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };

my \$dict = '/var/lib/dict/words';  # YMMV

sub each_char_has_different_freq {
my (\$r, \$f) = @_;
keys %\$r == keys %\$f
}

sub frequencies_form_a_sequence {
my (\$r) = @_;
keys %\$r == grep \$r->{\$_}, 1 .. keys %\$r
}

my @linenlessnesses;
open my \$IN, '<', \$dict or die \$!;
while (<\$IN>) {
chomp;
next if /\W/;

my %freq;
\$freq{\$_}++ for split //;

my %r = reverse %freq;
push @linenlessnesses, \$_
if each_char_has_different_freq(\%r, \%freq)
&& frequencies_form_a_sequence(\%r);
}
say for sort { length \$a <=> length \$b } @linenlessnesses;
[download]```

Interesting output (the longest words) on my machine:

```deadheaded
evennesses
keennesses
peppertree
rememberer
restresses
sanenesses
sereneness
sleeveless
sussararas
susurruses
[download]```

```(\$q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord
}map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+\$1,++print+eval\$q,q,a,
[download]```

"counting frequencies means hashes" - not always :)

```#!/usr/bin/perl

# http://perlmonks.org/?node_id=1201500

use strict;
use warnings;

my \$file = '/usr/share/dict/words'; # YMMV

open my \$fh, '<', \$file or die "\$! opening \$file";
while(<\$fh>)
{
/^[a-z]{6,}\$/i or next;
my \$word = \$_;
my @counts;
\$counts[ s/\$&//gi ] .= \$& while s/.//;
grep(!\$_ || /../, @counts) or print \$word;
}
[download]```

I played a lot with your solution to understand it well, so first of all - thank you, choroba.

Just my two cents: your function frequencies_form_a_sequence assumes that the frequencies start with one. The words with letter sequencies 2,3,4 etc. will not be selected then.

Should it not be in the following form (min, max and all are from List::Util and List::MoreUtils):

```sub frequencies_form_a_sequence
{
my (\$r) = @_;
my \$min = min keys %\$r;
return 0 if \$min == 1; # Just for test.
my \$max = max keys %\$r;
return 0 if \$min == \$max; # Probably not a sequence.
my \$bool = all { defined \$r->{\$_} } \$min .. \$max;
}
[download]```

I find the following words then (cannot pretend to know what they mean though :) ):

```addda
ajaja
alala
anana
arara
cocco
essee
esses
igigi
nanna
pappa
peepe
reree
susus
taata
tatta
ululu
xxiii
deedeed
sasararas
sassarara
[download]```
Why not, before the update, the challenge wasn't exactly specified, so I understood it as 1 .. n.

I'd just use exists instead of defined in the all block.

```(\$q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord
}map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+\$1,++print+eval\$q,q,a,
[download]```
Re: Puzzle Regex: Letter Frequency Arithmetic Sequence
by LanX (Cardinal) on Oct 17, 2017 at 15:42 UTC
The link you gave shows an example but doesn't phrase criteria, which is unfortunate for a puzzle.

Please see How do I post a question effectively?

Supposing the frequency of letters has to be an ascending sequence. ..

I'm not aware of a possibility to sum the count of matches in pure � m// regexes so probably it's possible in tr// but those can't backtrack.

Probably if you code 1..n matches for a fixed n into your regex and combine n look-aheads you can built an and condition. *

This wouldn't work for arbitrary big n though.

Cheers Rolf
(addicted to the Perl Programming Language and ☆☆☆☆ :)
Je suis Charlie!

*) that is find one letter exactly once and find one letter exactly twice and so on.

already the first term is too tricky for me...

�) no Perl embedded

##### update

First step: What is a regex to find one letter exactly once?

##### update

Waiting for tybalt89 ... tick tock tick tock ... ;)

A regex to find if a letter occurs exactly \$n times.

Is this what you were looking for as a first step?

```#!/usr/bin/perl

# http://perlmonks.org/?node_id=1201500

use strict;
use warnings;

my @words = glob '{i,x}' x 6;

print "@words\n";

my \$n = 1; # number of times a letter occurs in a word

my @oneletter = grep
/^(?|
(?=.*?(.)(?!.*\1))
(?:
(?:(?!\1).)*
\1
){\$n}
(?:(?!\1).)*
|
(?=.*(.))
(?:
(?:(?!\1).)*
\1
){\$n}
(?:(?!\1).)*
)\$
/x, @words;

print "\n@oneletter\n\n", scalar @oneletter, "\n";
[download]```

Update: fails for some test cases

Can you share some interesting test cases?

-QM
--
Quantum Mechanics: The dreams stuff is made of

Too much cheating?

```#!/usr/bin/perl

# http://perlmonks.org/?node_id=1201500

use strict;
use warnings;

my \$file = '/usr/share/dict/words';

open my \$fh, '<', \$file or die "\$! opening \$file";
chomp( my @words = grep /^[a-z]{5,}\$/, <\$fh>);

my @good = grep frequencysequence(\$_), @words;

print "@good\n\n" . @good . "\n";

sub frequencysequence
{
for my \$n ( 1 .. ((-1 + sqrt 1 + 8 * length) / 2) =~ s/^(\d+)\..*/\$1
+ + 1/er )
{
"@_\n@_" =~ /
(.).*
\n
(?: (?:(?!\1).)* \1 ){\$n}
(?:(?!\1).)*
\$ /x or return 0;
}
return 1;
}
[download]```

On my system outputs:

```acacia allele assays banana baobab bedded bonobo bowwow cocoon deadhea
+ded deemed doodad eddied eerier effete fesses heeded horror hubbub in
+ning lessee lollop mammal manana messes needed papaya peeped peeper p
+epped pepper pippin powwow reefer revere rococo salsas seeded senses
+sereneness settee sleeveless tattoo teeter teethe wedded weeded xxvii
+i xxxvii

49
[download]```
Nice solution of the "variable length lookbehind" not allowed problem!

But why \n instead of \$ ? *

##### edit

I suppose it's necessary because the following assertion wouldn't be executed?

##### update

*) well, I just noticed the "@_\n@_" part where you are doubling the input, which explains the "\n" and some other confusion about pos.

I agree, "too much cheating!" ;-p

Cheers Rolf
(addicted to the Perl Programming Language and ☆☆☆☆ :)
Je suis Charlie!

...sum the count of matches...

There's always the brute force method of generating every possible regex, assuming arithmetic sequence starting at 1. For a max frequency of 6, this covers 21 letter words. It makes me wonder what the breaking point is of my machine.

Note that in (3), only lengths that are triangular will do.

I'll have to find some time to chase this -- perhaps tomorrow.

-QM
--
Quantum Mechanics: The dreams stuff is made of

I just remembered that Perl supports recursive regexes and relative back-references, so you might not be limited to a hard limit.

Unfortunately I don't have the time to play with it.

FWIW if using s///g was allowed you just needed a pattern which repeatedly deleted one character of each group, iff exactly one of them is unique.

Your criterion was met, if the string is empty afterwards.

Cheers Rolf
(addicted to the Perl Programming Language and ☆☆☆☆ :)
Je suis Charlie!

Re: Puzzle Regex: Letter Frequency Arithmetic Sequence
by kcott (Bishop) on Oct 18, 2017 at 06:06 UTC

G'day QM,

TMTOWTDI. Here's "pm_1201500_dict_char_ordered_count.pl":

```#!/usr/bin/env perl

use strict;
use warnings;

use constant MIN_LENGTH => 10; # 1+2+3+4

RECORD:
while (<>) {
next RECORD unless MIN_LENGTH < length;
next RECORD unless /^([a-z]+)\$/;
my %char_count_of;
\$char_count_of{substr \$1, \$_, 1}++ for 0 .. length(\$1) - 1;

my \$check = 1;

for (sort { \$a <=> \$b } values %char_count_of) {
next RECORD unless \$_ eq \$check++;
}

print;
}
[download]```

Looking for words with ten characters or more:

```\$ pm_1201500_dict_char_ordered_count.pl /usr/share/dict/words
beerbibber
chachalaca
isoosmosis
kotukutuku
rememberer
sereneness
sleeveless
[download]```

Changing MIN_LENGTH:

```use constant MIN_LENGTH => 6; # 1+2+3
[download]```

I get all of those ten-character words, interspersed amongst over a hundred six-character words, starting with

```\$ pm_1201500_dict_char_ordered_count.pl /usr/share/dict/words | head -
+5
allele
amamau
ananas
ananda
annona
[download]```

and ending with

```\$ pm_1201500_dict_char_ordered_count.pl /usr/share/dict/words | tail -
+5
ubussu
venene
wedded
weeded
weewow
[download]```

— Ken

Re: Puzzle Regex: Letter Frequency Arithmetic Sequence -- oneliner
by Discipulus (Abbot) on Oct 18, 2017 at 08:33 UTC
ThereIsMoreThanOne.. oneliner to do this!

Me too I'd use hash instead of regexes.

The problem is that you'll end with deadheaded rococo messes or with at least with an ugly fruit salad of ananas patata papaya batata acacia pepper

```
perl -MList::Util="all" -lne "next unless/.../;my(%r,\$c);\$r{\$_}++for(/
+./g);print if all{++\$c==\$_}sort values %r" linux.words

# or minimum 10 char words only

perl -MList::Util="all" -lne "next unless/.{10,}/;my(%r,\$c);\$r{\$_}++fo
+r(/./g);print if all{++\$c==\$_}sort values %r" linux.words
[download]```

L*

There are no rules, there are no thumbs..
Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: Puzzle Regex: Letter Frequency Arithmetic Sequence
by tybalt89 (Prior) on Oct 20, 2017 at 14:26 UTC

Just a tiny little cheat :)

```#!/usr/bin/perl

# http://perlmonks.org/?node_id=1201500

use strict;
use warnings;
\$| = 1;

my \$file = '/usr/share/dict/words';

open my \$fh, '<', \$file or die "\$! opening \$file";
while(<\$fh>)
{
chomp;
/[^a-z]/ and next;

"\$_\n\$_" =~ /^
(?=              .* (.) .* \n (?: (?:(?!\1).)* \1 ){1} (?:(?!\1).)
+* \$ )
(?=              .* (.) .* \n (?: (?:(?!\2).)* \2 ){2} (?:(?!\2).)
+* \$ )
(?= .{3,3}  \n | .* (.) .* \n (?: (?:(?!\3).)* \3 ){3} (?:(?!\3).)
+* \$ )
(?= .{3,6}  \n | .* (.) .* \n (?: (?:(?!\4).)* \4 ){4} (?:(?!\4).)
+* \$ )
(?= .{3,10} \n | .* (.) .* \n (?: (?:(?!\5).)* \5 ){5} (?:(?!\5).)
+* \$ )
(?= .{3,15} \n | .* (.) .* \n (?: (?:(?!\6).)* \6 ){6} (?:(?!\6).)
+* \$ )
(?= .{3,21} \n | .* (.) .* \n (?: (?:(?!\7).)* \7 ){7} (?:(?!\7).)
+* \$ )
(?= .{3,28} \n | .* (.) .* \n (?: (?:(?!\8).)* \8 ){8} (?:(?!\8).)
+* \$ )
(?= .{3,35} \n )
/x and print "\$_ ";
}
print "\n";
[download]```

I had done that deliberately because I considered them "uninteresting". Easily fixed.

```#!/usr/bin/perl

# http://perlmonks.org/?node_id=1201500

use strict;
use warnings;
\$| = 1;

my \$file = '/usr/share/dict/words';

open my \$fh, '<', \$file or die "\$! opening \$file";
while(<\$fh>)
{
chomp;
/[^a-z]/ and next;

"\$_\n\$_" =~ /^
(?=              .* (.) .* \n (?: (?:(?!\1).)* \1 ){1} (?:(?!\1).)
+* \$ )
(?= .{1,1}  \n | .* (.) .* \n (?: (?:(?!\2).)* \2 ){2} (?:(?!\2).)
+* \$ )
(?= .{1,3}  \n | .* (.) .* \n (?: (?:(?!\3).)* \3 ){3} (?:(?!\3).)
+* \$ )
(?= .{1,6}  \n | .* (.) .* \n (?: (?:(?!\4).)* \4 ){4} (?:(?!\4).)
+* \$ )
(?= .{1,10} \n | .* (.) .* \n (?: (?:(?!\5).)* \5 ){5} (?:(?!\5).)
+* \$ )
(?= .{1,15} \n | .* (.) .* \n (?: (?:(?!\6).)* \6 ){6} (?:(?!\6).)
+* \$ )
(?= .{1,21} \n | .* (.) .* \n (?: (?:(?!\7).)* \7 ){7} (?:(?!\7).)
+* \$ )
(?= .{1,28} \n | .* (.) .* \n (?: (?:(?!\8).)* \8 ){8} (?:(?!\8).)
+* \$ )
(?= .{1,35} \n )
/x and print "\$_ ";
}
print "\n";
[download]```

On my machine, this outputs:

```a aah acacia add aha aka all allele app arr ass assays b baa banana ba
+obab bbl bedded bee bib bob bonobo boo bowwow brr bub c cocoon coo d
+dad dds deadheaded deemed did doodad dud e ebb eddied eek eel eerier
+eff effete egg eke ell ere err eve ewe eye f fee fesses foo g gag gee
+ gig goo h heeded hmm horror hubbub huh i ill inn inning j k l lee le
+ssee lii lollop loo m mam mammal manana messes mom moo mum n nee need
+ed non nun o odd off oho ooh opp p pap papaya pee peeped peeper pep p
+epped pepper pip pippin poo pop powwow ppm ppr pup q r reefer revere
+rococo s salsas see seeded senses sereneness settee shh sis sleeveles
+s sqq ssh t tat tattoo tee teeter teethe tit too tot tut u usu v vii
+w wedded wee weeded woo wow x xii xix xxi xxv xxviii xxxvii y z zoo
[download]```
Re: Puzzle Regex: Letter Frequency Arithmetic Sequence
by LanX (Cardinal) on Oct 20, 2017 at 14:38 UTC
Newer Perls have some fancy features in the regex engine (apart from embedding Perl), so probably it could be solved.

But be warned, I wouldn't be surprised if a solution to this "puzzle" would proof the engine to be Turing complete.

Cheers Rolf
(addicted to the Perl Programming Language and ☆☆☆☆ :)
Je suis Charlie!

Log In?
 Username: Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1201500]
Approved by Corion
Front-paged by kcott
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (5)
As of 2021-04-10 22:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?

No recent polls found

Notices?