Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Combinatorics in Perl

by wouldbewarrior (Acolyte)
on Apr 30, 2004 at 05:44 UTC ( [id://349341]=perlquestion: print w/replies, xml ) Need Help??

wouldbewarrior has asked for the wisdom of the Perl Monks concerning the following question:

I have a line of text that looks like this:
ch i_DEL ts a k u_DEL s i_DEL #

I for each letter marked by the _DEL suffix, I want to print two entries, one with just the vowel (not the _DEL tag) and the other as nothing (i.e., delete the vowel and the _DEL tag). I want an end result that looks like this:
ch i ts a k u s i # ch i ts a k u s # ch i ts a k s # ch i ts a k s i # ch ts a k u s i # ch ts a k u s # ch ts a k s # ch ts a k s i #

i.e., all possible variations. I have thought of a very convoluted way to do this, but I would like something nice and elegant - is there a way to make a regexp, perhaps?

Replies are listed 'Best First'.
Re: Combinatorics in Perl
by saintmike (Vicar) on Apr 30, 2004 at 08:06 UTC
    If you're willing to give up the blanks, here's a rarely known feature of Perl's globbing permutation functionality:
    for(<{ch}{i,}{ts}{a}{k}{u,}{s}{i,}#>) { print "$_\n"; }
    It prints:
    chitsakusi# chitsakus# chitsaksi# chitsaks# chtsakusi# chtsakus# chtsaksi# chtsaks#
Re: Combinatorics in Perl
by tkil (Monk) on Apr 30, 2004 at 06:19 UTC

    Here's one regex (plus while loop) for it:

    while ( s{ ^ (.*?) (\s* \S+)_DEL (.*) $ } {$1$3\n$1$2$3}xmg ) {}

    I am trying to cook up one using (?{ ... }) constructs, but not having as much luck.

    Update: Oops, left off the empty loop body...

    Update (at 2004-04-29 23:49 -0700): Here is a more procedural solution that uses the same basic idea as above. It might be a bit more useful, as it returns a list of variants instead of one big string. (Would be interesting to benchmark them...)

    sub del_selective { my $n = shift; if ( $n =~ m/(.*?)(\s*\S+)_DEL(.*)/s ) { my $s1 = $1.$3; my $s2 = $1.$2.$3; return ( del_selective( $s1 ), del_selective( $s2 ) ); } else { return ( $n ); } }
      Wow, I like it! That was very well done. I have this great big Japanese transcription database that some other engineers and I have been trying to use to cook up a speech recognizer for the language, but the transcription database has a bunch of crap in it. One way to fix this, for the recognizer is just to give it all available options, wherever there is ambiguous 'crap'. We were having a lot of trouble with this silly problem today though. So thanks a lot!
      argh, i worked for 2 hours and came up with nothing particularly useful. very nice job.

      perl -e'$_="nwdd\x7F^n\x7Flm{{llql0}qs\x14";s/./chr(ord$&^30)/ge;print'

Re: Combinatorics in Perl
by TimToady (Parson) on Apr 30, 2004 at 22:20 UTC
    Well, you can get all fancy with recursive solutions, but if I were your teacher, I'd give the most credit to a solution that merely counts from 0 to 7 and uses the bit pattern to decide which things to include and which things to leave out. That's an algorithm you could probably get someone else to understand and maintain.
      I find your lack of faith (in recursion) ...disturbing.

      With a bit more documentation, recursion or regular expressions are perfectly maintainable. I'll plead guilty to under-documenting my original response; one of the reasons I tucked it into a /x construct was to be able to add comments later.

      I'll note that you didn't post any code, though. I'd be curious to see your solution. Here are various of mine; which do you yourself think is most maintainable?

      #!/usr/bin/perl use strict; use warnings; sub int_1 ( $ ) { my $template = shift; # split template on the characters to switch in and out my @chunks = split /(\s*\S+)_DEL/, $template; # make sure the odd elements are the switched chars. unshift @chunks, '' if $template =~ /^\S+_DEL\b/; # check for boundary conditions my $n_toggles = int( @chunks / 2 ); if ( !$n_toggles ) { return $template; } elsif ( $n_toggles > 10 ) { die "max of 10 toggles (found $n_toggles) in '$template'"; } my @rv; my $n_bits = ( 1 << $n_toggles ) - 1; foreach my $bits ( 0 .. $n_bits ) { my $s = ''; for ( my $i = 0; $i < @chunks; $i += 2 ) { $s .= $chunks[$i]; $s .= $chunks[$i+1] if $bits & 1; $bits >>= 1; } push @rv, $s; } return @rv; } my $MAX_BIT = 0x10000; my $ALL_BITS = $MAX_BIT - 1; sub int_2 ( $ ) { my $template = shift; # split template on the phonemes to switch in and out my @chunks = split /(\s*\S+_DEL)/, $template; # mark which are toggles my @toggles; my $next_bit = 1; foreach ( @chunks ) { if ( s/_DEL$// ) { push @toggles, $next_bit; $next_bit <<= 1; die "too many toggles!" if $next_bit > $MAX_BIT; } else { push @toggles, $ALL_BITS; } } # anything to do? return $template if $next_bit == 1; my @rv; foreach my $bits ( 1 .. $next_bit ) { my $s = ''; for my $i ( 0 .. $#chunks ) { $s .= $chunks[$i] if $bits & $toggles[$i]; } push @rv, $s; } return @rv; } sub int_3 ( $ ) { my $template = shift; # split template on the phonemes to switch in and out my @chunks = split /(\s*\S+_DEL)/, $template; # mark which are toggles my $next_bit = 1; foreach ( @chunks ) { if ( s/_DEL$// ) { $_ = [ $next_bit, $_ ]; $next_bit <<= 1; die "too many toggles!" if $next_bit > $MAX_BIT; } else { $_ = [ $ALL_BITS, $_ ]; } } # anything to do? return $template if $next_bit == 1; my @rv; foreach my $bits ( 1 .. $next_bit ) { push @rv, join '', map { $_->[0] & $bits ? $_->[1] : '' } @chu +nks; } return @rv; } sub re_1 ( $ ) { my @templates = @_; # yet another variant on the repeat-and-double-until-done while # loop. while (1) { my @new_templates; foreach ( @templates ) { if ( m/(.*?)(\s*\S)_DEL(.*)/ ) { push @new_templates, $1.$2.$3, $1.$3; } } return @templates unless @new_templates; @templates = @new_templates; } } sub re_2 ( $ ) { my $template = shift; # make sure it ends with exactly one newline $template =~ s/\s*$/\n/; # make all the breaks unique: { my $i = 0; $template =~ s/(_DEL\b)/$1 . ++$i/ge; } # for each break, replace template with two copies of itself, one # with the replacement, one without: while ( $template =~ /((\s*\S+)_DEL(\d+)\b)/ ) { my ( $break, $rep ) = ( $1, $2 ); my ( $with, $without ) = ( $template ) x 2; $with =~ s/\Q$break/$rep/g; $without =~ s/\Q$break//g; $template = $with . $without; } # break back into individual chunks. (trailing \n should be # handled gracefully.) return split /\n/, $template; } sub re_3 ( $ ) { my $template = shift; # basic idea is to find every line with a blah_DEL in it, then # replace that line with two lines: one with "blah" inserted, one # without. Since there can be multiple _DEL directives per line, # we need to iterate until they're all gone. do {} while $template =~ s{ ^ (.*?) (\s* \S+)_DEL\b (.*) $ } {$1$3\n$1$2$3}xmg; # break back into individual chunks. (trailing \n should be # handled gracefully.) return split /\n/, $template; } while ( my $template = <DATA> ) { $template =~ s/\s+$//; print "=== template:\n$template\n"; print "--- int_1:\n", map "$_\n", int_1 $template; print "--- int_2:\n", map "$_\n", int_2 $template; print "--- int_3:\n", map "$_\n", int_3 $template; print "--- re_1:\n", map "$_\n", re_1 $template; print "--- re_2:\n", map "$_\n", re_2 $template; print "--- re_3:\n", map "$_\n", re_3 $template; } exit 0; __DATA__ ch i_DEL ts a k u_DEL s i_DEL #
Re: Combinatorics in Perl
by wouldbewarrior (Acolyte) on Apr 30, 2004 at 08:20 UTC
    wow, that too is an extremely elegant little solution. I like it. Thanks!
Re: Combinatorics in Perl
by flyingmoose (Priest) on Apr 30, 2004 at 14:07 UTC
    This problem seems rather arbitrary ('i_DEL', etc) -- any chance this is a homework problem? Either way, the damage is done --- but if it's homework, it is always a good practice to announce that you are asking for help with homework when posting the question. If I'm wrong, accept my apologies.
      While it does appear rather arbitrary, it wasn't a homework problem. I work at the Panasonic Speech Technology Research Lab in Santa Barbara (now PDNL for a number of asinine and annoying reasons) and we were in the process of developing a japanese recognizer (hence the Japanese exaMPLE), however, there was a problem with the input lexicon we were using to train the recognizer. The transcription lexicon did not reflect several important phonological idiosyncracies of japanese, which happen not to be predictable. The best solution for us was to tag the phonemes that were susceptible to the idiosyncracies and then produce all possible variations, as this ensured that the correct model would get into the recognizer. Unfortunately nobody had the perl expertise to do this, but once we realized this we thought we'd like to find out what clever solution might be waiting out there...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://349341]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (8)
As of 2024-04-25 08:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found