Sorting Strings By Vowel Sequence

NewToPerl777 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Sorting Strings By Vowel Sequence by choroba (Cardinal) on Nov 17, 2014 at 21:52 UTC
I used the Schwartzian Transform: `#!/usr/bin/perl use warnings; use strict; use feature qw{ say }; my @words = qw( fanfare apparate panacea parmesan albatross albacore false vermeil candelabra beans ); say for map $_->[0], sort { $a->[1] cmp $b->[1] } map [ $_, join q(), /[aeiou]/g ], @words;` [download] لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l]
Re^2: Sorting Strings By Vowel Sequence by Zucan (Beadle) on Nov 18, 2014 at 02:17 UTC
UPDATED: Second edge case added. I am curious how this deals with words with vowel sequences being the same? For example, running the above script with the following words: `my @words = qw( xaxexix babebib xaxexi babebibb );` [download] Produces the following output: `xaxexix babebib xaxexi babebibb` [download] Since the vowel positions for xaxexix and babebib are the same, I would expect a secondary alpha-sort to occur here. Another potential issue is how the algorithm deals with words where the vowel sequences are in the same order but not in the same positions. For example: `my @words = qw( babebib baebbib );` [download] Produces the following output: `babebib baebbib` [download] Even though babebib is alphabetically before baebbib, the e occurs one position sooner in baebbib, which should put it before babebib. Maybe I am reading too much into the initial question. Having done Computer Programming Contests a number of times before, these are the kinds of things that the judges are looking for to see if the teams caught all the possibilities. Zucan	[reply] [d/l] [select]
Re^3: Sorting Strings By Vowel Sequence by Loops (Curate) on Nov 18, 2014 at 02:39 UTC
Well it wasn't part of the original problem statement, so the solution allows sort to pick any order it chooses for elements with the same vowel string. You can tack on the original word to the end of the vowels string to sort by consonants when vowel strings match. Here is that idea, but using GRT Sort instead of ST: `use 5.014; say join ', ', map {s/.*\0//r} sort map {tr/aeiou//cdr."\0$_"} qw( xexix babebib xaxexi babebibb );` [download] output: `babebib, babebibb, xaxexi, xexix` [download]	[reply] [d/l] [select]
Re^4: Sorting Strings By Vowel Sequence by Zucan (Beadle) on Nov 18, 2014 at 02:52 UTC
Re^3: Sorting Strings By Vowel Sequence by Anonymous Monk on Nov 18, 2014 at 04:09 UTC
You can combine many sorts. Here's something quick and dirty: `#!/usr/bin/perl use warnings; use strict; use feature qw{ say }; my @words = qw( xaxexix babebib xaxexi babebibb baebbib ); sub prepare_sort { my ( $vowels, $positions ) = ( '', '' ); while (/([aeiou])/g) { $vowels .= $1; $positions .= sprintf '%03d', pos; } return [ $_, $positions, $vowels ]; } say for map $_->[0], sort { $a->[2] cmp $b->[2] } # vowels sort sort { $a->[1] cmp $b->[1] } # positional sort sort { $a->[0] cmp $b->[0] } # alphabetical sort map prepare_sort(), @words;` [download] Output: `baebibb babebib babebibb xaxexi xaxexix` [download]	[reply] [d/l] [select]
Re^4: Sorting Strings By Vowel Sequence by Anonymous Monk on Nov 18, 2014 at 04:31 UTC
Re^5: Sorting Strings By Vowel Sequence by Anonymous Monk on Nov 18, 2014 at 04:49 UTC
Re^3: Sorting Strings By Vowel Sequence by Laurent_R (Canon) on Nov 19, 2014 at 18:52 UTC
Hi Zucan If you want to sort alphabetically the words with the vowels at the same position, you can make just a very simple change to just one line of the original ST code suggested by choroba: `#!/usr/bin/perl use warnings; use strict; use feature qw{ say }; my @words = qw( fanfare apparate panacea parmesan albatross albacore false vermeil candelabra beans ); say for map $_->[0], sort { $a->[1] cmp $b->[1] \|\| $a->[0] cmp $b->[0]} map [ $_, join q(), /[aeiou]/g ], @words;` [download] For each time comparison step in the sorting algorithm, if the vowels compare equal, then `$a->[1] cmp $b->[1]` [download] return 0. And it returns -1 or +1 in all the other cases. If you get a 0, then the logical or (`\|\|`) will execute the second part of the sort statement: `$a->[0] cmp $b->[0]` [download] to try to see if the whole expression is true. But if you get -1 or +1 with the first part, then the whole expression if already known to be true, so that the second part of the statement will not be executed (short-circuited). I do not think that the other edge case you mention is compatible with the initial question.	[reply] [d/l] [select]
Re^3: Sorting Strings By Vowel Sequence by NewToPerl777 (Novice) on Nov 18, 2014 at 21:00 UTC
That is a very good point, I never stopped to think if the words were the same. Realizing now that could be a possibility I will fix that. Thank you for pointing that out. I am new to Perl so I am just taking programming contest-esque problems and am trying to work them out to learn.	[reply]
Re^2: Sorting Strings By Vowel Sequence by NewToPerl777 (Novice) on Nov 17, 2014 at 21:56 UTC
Wow, I did not know this was a known method. Thank you for the link, insight, and example I am going to research this more. Thank you very much for your help!	[reply]
Re^3: Sorting Strings By Vowel Sequence by Laurent_R (Canon) on Nov 17, 2014 at 22:47 UTC
The Schwartzian Transform is a fairly common idiom in Perl and is, to a large extent, using some principles of functional programming (languages such as Lisp or Scheme) in Perl. If Choroba had not done it before, I would have suggested almost the same solution (with probably just some very minor syntactical variations). To understand such a construct, you have to read it from bottom to top (and sometimes, depending on the exact layout, from right to left). To understand it better, you can split it mentally into three separate instructions: `my @temp_array = map [ $_, join q(), /[aeiou]/g ], @words; @temp_array = sort { $a->[1] cmp $b->[1] } @temp_array; say for map $_->[0], @temp_array;` [download] The first instruction creates a temporary array of array references whose content will be something like this: `(["fanfare", "aae"], [apparate, "aaae"], etc.)` [download] The second instruction will sort the array references in accordance with the second field of each array ref (i.e. the vowels). At the end of this, the temporary array contains the array refs sorted according to the vowels. The final step is to print out the original words (stripped of the data added in the first step), which will be now sorted according to the vowels. This process is sometimes referred to as : "decorate, sort, undecorate", because it is adding new data to the original one to enable the sort to work efficiently, and then removes the added data. Now, the beauty of the Schwartzian Transform is that you don't really need the temporary array: you can just pipeline the data from one step to the next. So if you look back at the code suggested by choroba, the map at the bottom creates a list of array references which are fueled into the sort line above, which sorts in accordance with what the map at the bottom has added to the data, and the result of the sort is fed into the map of the first line, which removes what the first map had added and returns the original list duly sorted in accordance to the vowels of each word. I hope this helps understanding this slightly advanced construct. Once understood, it becomes very natural to use it.	[reply] [d/l] [select]
Re^4: Sorting Strings By Vowel Sequence by NewToPerl777 (Novice) on Nov 18, 2014 at 20:57 UTC
Re: Sorting Strings By Vowel Sequence by AnomalousMonk (Archbishop) on Nov 17, 2014 at 23:44 UTC
If you are dealing with lotsa data, the so-called Guttman-Rosler Transform (GRT, the second technique below; see A Fresh Look at Efficient Perl Sorting, which discusses both ST and GRT techniques) may be faster: c:\@Work\Perl>perl -wMstrict -le "use Test::More 'no_plan'; use Test::NoWarnings; ;; VECTOR: for my $ar_vector ( [ [ qw(fanfare apparate panacea parmesan albatross albacore false v +ermeil candelabra beans) ], [ qw(apparate fanfare panacea albatross albacore false parmesan c +andelabra beans vermeil) ], ], ) { my ($ar_unsorted, $ar_expected) = @$ar_vector; my @unsorted = @$ar_unsorted; note qq{input: @$ar_unsorted}; ;; my @sorted = map $_->[0], sort { $a->[1] cmp $b->[1] } map [ $_, join '', m{ [aeiou]+ }xmsg ], @unsorted ; is_deeply \@sorted, $ar_expected, qq{ST: @sorted}; ;; @sorted = map { (split m{ \x00 }xms)[1] } sort map join(qq{\x00}, join('', m{ [aeiou]+ }xmsg), $_), @unsorted ; is_deeply \@sorted, $ar_expected, qq{GRT: @sorted}; ;; is_deeply \@unsorted, $ar_unsorted, qq{input data unchanged}; } " # input: fanfare apparate panacea parmesan albatross albacore false ve +rmeil candelabra beans ok 1 - ST: apparate fanfare panacea albatross albacore false parmesan + candelabra beans vermeil ok 2 - GRT: apparate fanfare panacea albatross albacore false parmesan + candelabra beans vermeil ok 3 - input data unchanged ok 4 - no warnings 1..4 [download] Update: The `map join(qq{\x00}, join('', m{ [aeiou]+ }xmsg), $_),` expression in the GRT sort above is rather messy. A neater (tested) version is `map join('', m{ [aeiou]+ }xmsg, qq{\x00}, $_),`	[reply] [d/l] [select]
Re: Sorting Strings By Vowel Sequence by salva (Canon) on Nov 18, 2014 at 10:06 UTC
You can use Sort::Key. If you have a recent perl (>= 5.14): `use 5.14; use Sort::Key qw(keysort); my @sorted = keysort { s/[^aeiou]+//gr } @data;` [download] Otherwise: `use Sort::Key qw(keysort); my @sorted = keysort { join '', /[aeiou]+/g } @data;` [download] Or in pure perl, just build a hash with the sorting keys: `my %keys; $key{$_} = s/[^aeiou]+//gr for @data; my @sorted = sort { $key{$a} cmp $key{$b} } @data;` [download] The ST is an overrated technique, but quite popular because learning it is like some kind of initiation ritual into the realms of intermediate Perl. In practice, in only works for datasets that are not too big because it uses too much memory.	[reply] [d/l] [select]
Re: Sorting Strings By Vowel Sequence by Zucan (Beadle) on Nov 18, 2014 at 03:31 UTC
My solution isn't as elegant as the other algorithms already suggested, however, I think mine is more straight-forward, especially if you are still new to Perl: `#!/usr/bin/perl -w use strict; my @words = qw(fanfare apparate panacea parmesan albatross albacore false vermeil candelabra beans); my %hash; foreach my $word (@words) { my $vowels = lc $word; $vowels =~ s/[^aeiou]//g; # EXAMPLE B my @array = ($word); push @array, @{$hash{$vowels}} if defined $hash{$vowels}; $hash{$vowels} = \@array; } my @sorted; foreach my $vowels (sort keys %hash) { push @sorted, @{$hash{$vowels}}; # EXAMPLE A } print " INPUT: @words\n"; print "SORTED: @sorted\n";` [download] The above code provides identical output to what was suggested in the original question. For the edge case where words have the same vowels in the same positions, there may be a desire to sort the words in alphabetical order. The above code would take the words "xaxexix babebib" (in that order) and print "xaxexix babebib". To make the code print "babebib xaxexix" instead, change the line marked "EXAMPLE A" to the following: `push @sorted, sort @{$hash{$vowels}};` [download] For the edge case where words have the same vowels in the same order but with differing positions, there may be a desire to have the a vowel that shows up sooner in a word to sort higher than a word that has the vowel show up later in a word. For example, the above code would take the words "babebib baebbib" (in that order) and print "babebib baebbib". To make the code print "baebbib babebib" instead, change the line marked "EXAMPLE B" to the following: `$vowels =~ s/[^aeiou]/z/g;` [download] We essentially replace all the non-vowels with "z", preserving the position of the vowels. This was a great question. It is great to see the different ways a problem can be solved in Perl!	[reply] [d/l] [select]


P is for Practical
	PerlMonks