Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Golf: Arbitrary Alphabetical Sorting

by Masem (Monsignor)
on May 09, 2001 at 19:52 UTC ( #79130=perlmeditation: print w/replies, xml ) Need Help??

Based on the comments from this node on sorting, here's a golf challenge for you.

Given an array of "words" that are to be sorted, and an array of characters which make up those words (aka the 'alphabet'. This second array is already sorted in 'alphabetical' order. Assume that no character is repeated in this second array. Also assume that no other characters are present in the words. Also assume that everything is already in the 'lower case' form. Note that the number of characters in the alphabet is arbitary.

Find the perl golf (fewest # of characters in the code) that will sort the words based on this alphabetic order. Some test cases that you should test against would be:

@alpha = ('a'..'z'); @l = o( \@alpha, \@words ); # should give same results as @l = sort { $a cmp $b } @words; @alpha2 = reverse ('a'..'z'); @l2 = o( \@alpha2, \@words ); # should give same results as @l2 = reverse sort { $a cmp $b } @words; + or @l2 = sort { $b cmp $a };

Update to clear up the confusion as merlyn's replies below indicate, the 'alphabet' is not necessarily the typcial 26-character english alphabet, and may consider of any character including and not limited to numbers, punctuation, accented characters, and so forth. For example, one could call the 'alphabet' for hexadecimal numbers as (0..9A..F).

Dr. Michael K. Neylon - || "You've left the lens cap of your mind on again, Pinky" - The Brain

Replies are listed 'Best First'.
Re (tilly) 1: Golf: Arbitrary Alphabetical Sorting
by tilly (Archbishop) on May 09, 2001 at 21:30 UTC
    The body is 93 characters. Can handle anything Perl's RE engine can call a character (including null and return), but like chipmunk horribly non-strict and assumes you will not see characters that are not listed in the alphabet.
    sub o{ ($c,$s)=@_;@m{@$c}=map{10*@$c+$_}0..@$c;sub t{$_=pop;s/./$m{$&}/gs;$_} +sort{t($a)cmp t($b)}@$s }
    Two subtle details. The /s is needed if returns are in the alphabet. And you cannot save a character by replacing string with numerical comparison without breaking on fairly small words.

    MeowChow is right. :-( But the following 91 character version will handle alphabets of up to 90,000 characters. Given that this is bigger than Unicode's space, that is sufficient for any alphabet that any version of Perl will accept:

    sub o{ ($c,$s)=@_;@m{@$c}=map{$_+9999}1..@$c;sub t{$_=pop;s/./$m{$&}/gs;$_}so +rt{t($a)cmp t($b)}@$s }
    (And so we create more legacy code with a stupid hard-coded limit that is bound to come back to bite us...)

    UPDATE 2
    MeowChow told me to lose parens gaining 2, and to make the range several hundred million by rewriting the number. Also I commified a map. This falls to 88. Or change to "99" to make it 86 if you are willing to accept one-byte alphabets.

    sub o{ ($c,$s)=@_;@m{@$c}=map$_+9**9,1..@$c;sub t{$_=pop;s/./$m{$&}/gs;$_}sor +t{t($a)cmp t$b}@$s }

    UPDATE 3
    ChemBoy dropped to 87 by rewriting the number again:

    sub o{ ($c,$s)=@_;@m{@$c}=map$_+1e9,1..@$c;sub t{$_=pop;s/./$m{$&}/gs;$_}sort +{t($a)cmp t$b}@$s }

    UPDATE 4
    tye made it 86 with:

    sub o{ ($c,$s)=@_;@m{@$c}=map$_+~0,1..@$c;sub t{$_=pop;s/./$m{$&}/gs;$_}sort{ +t($a)cmp t$b}@$s }

    UPDATE 5
    And I finish the round of golf by making it 83 with:

    sub o{ ($c,$s)=@_;@m{@$c}=1e8..@$c+1e8;sub t{$_=pop;s/./$m{$&}/gs;$_}sort{t($ +a)cmp t$b}@$s }
    (The ~0 trick doesn't work with ranges.)
      Wow. I don't think this will work on a 99 character alphabet, but I'm too lazy to verify this :)

      UPDATE 6: It just never ends:

      sub o{ $c=join'',shift;sub t{$_=pop;s/./~0+index$c,$&/ges;$_}sort{t($a)cmp t$ +b}@{+pop} }
      79 chars...
                     s aamecha.s a..a\u$&owag.print
        First of all that doesn't work because shift returns a reference. You need it to dereference the first element into an array. Secondly you know I love a 76 characters:
        sub o{ ($c,$s,$")=@_;sub t{$_=pop;s/./~0+index"@$c",$&/ges;$_}sort{t($a)cmp t +$b}@$s }
        (I have to waste 3 on removing spaces, dang.)

        In 5.6.x it appears you need to spend another character:

        sub o{ ($c,$s,$")=@_;sub t{$_=pop;s/./1e9+index"@$c",$&/ges;$_}sort{t($a)cmp +t$b}@$s }

        UPDATE 2
        Saving another character..back to 76.

        sub o{ ($c,$s,$")=@_;sub t{s/./1e9+index"@$c",$&/ges;$_}sort{t($_=$a)cmp t$_= +$b}@$s }

        UPDATE 3
        Throwing in tadman's improvement, 75:

        sub o{ ($c,*w,$")=@_;sub t{s/./1e9+index"@$c",$&/ges;$_}sort{t($_=$a)cmp t$_= +$b}@w }
        (Note that chr returning 2 chars wouldn't make Unicode come out right. Hence I am keeping the 1e9 trick.)
Re: Golf: Arbitrary Alphabetical Sorting
by merlyn (Sage) on May 09, 2001 at 20:01 UTC
    Well, here's a cut at it, even though it's not golfed yet:
    sub o { my ($a, $w) = @_; my %m; @m{@$a} = (a..z); my $k = join "", keys %m; my $v = join "", values %m; map { eval "tr/$v/$k/"; $_ } sort map { eval "tr/$k/$v/"; $_ } @$w; }

    -- Randal L. Schwartz, Perl hacker

    update: and of course, it's doggy slow. That eval needs to be saved, like in a coderef or something.
    update2: Bleh. RIght. 'a'..'z' was being presumptive that "letter" meant "letter" like I knew it. Here's the patch to make it work for larger sets, for reasonable values of work:
    sub o { my ($a, $w) = @_; my %m; @m{@$a} = 1..@$a; my $k = join "", map {sprintf "\\%03o", ord $_} keys %m; my $v = join "", map {sprintf "\\%03o", $_ } values %m; map { eval "tr/$v/$k/"; $_ } sort map { eval "tr/$k/$v/"; $_ } @$w; }
      While your $v/$k order is right, the one gotcha that I put in there was that the alphabet size was arbitrary, and not necessarily 26 characters.. it could be as many as 100, 1000, or more (Well, anything in non-Unicode above 255 makes no sense). So this trick doesn't work here.
      Dr. Michael K. Neylon - || "You've left the lens cap of your mind on again, Pinky" - The Brain
        Well, if it's a single "character", tr likes it. What's the problem? About the only thing that's messy is the use of a slash and dash, and I could probably hork that somehow by always mapping to an octal-ish escape.

        I've reread the question three times now, and I don't see how you are defining "character" in any way other than something that tr can wrangle. If so, what's the structure of a "word" then? It's no longer a string, which would be a sequence of "characters" that tr can handle!

        -- Randal L. Schwartz, Perl hacker

Re: Golf: Arbitrary Alphabetical Sorting
by chipmunk (Parson) on May 09, 2001 at 20:49 UTC
    This solution uses the same approach as merlyn's, translating the words so they can be compared directly. It assumes that the words will not contain the null character.

    96 characters.

    sub o { @k{@$A}=map chr,1..@{$A=$_[0]}; map{s/.*\0//s;$_}sort map{($x=$_)=~s/./$k{$&}/gs;"$x\0$_"}@{+pop} }
    BTW, although %k is not lexicalized to the sub, I don't think it matters, since any leftover contents which are not overwritten won't be used. (Also assume that no other characters are present in the words.)

    Update: I added /s to both substitutions, to avoid problems with \n. Thanks tilly!

      Note that in addition to not allowing \0 as a character you don't allow \n.
Re: Golf: Arbitrary Alphabetical Sorting
by tadman (Prior) on May 10, 2001 at 03:09 UTC
    Taking a few tricks from tilly, and influences from merlyn, this one appears to be 74 characters:
    sub o { ($c,*w,$")=@_;sub t{eval"y/\Q@$c\E/\0-\377/";$_}sort{t($_=$a)cmp t$_=$ +b}@w }
    This will only work on single character 'alphabets', or those that y// can handle as UNICODE.

    I forgot to include the "\Q" and "\E" escapes to prevent it from going south if the "alphabet" contained characters such as '/' or '\n'. 4 character penalty.

    Update 2:
    If you want to cheat, you can always insert the literal character ASCII 255 into the string instead of the comparatively verbose '\377', which saves 3 characters, yielding a length of 71, though some editors show that single "character" as 2 on screen.
      Excellent. As I responded in the buried thread, you can insert the literal characters for both escape sequences, yielding 70. This may really fubar some editors, but perl can handle it.

      BTW FWIW I don't consider this cheating.

Re: Golf: Arbitrary Alphabetical Sorting
by ChemBoy (Priest) on May 09, 2001 at 20:42 UTC

    I got a head start on this one, since I gave a recursive solution in the original thread. But I've never golfed before, so this is likely to go down in flames quickly... I believe it scores 110 as is (somewhat golfed, but probably not optimally).

    sub o { $q=join'',@{$_[0]}; sort {$i=0; {index($q,substr($a,$i,1))<=>index($q,substr($b,$i,1))or($i++,redo)}}@ +{$_[1]} }

    Update: Oof... difficulties is a nice way of putting it. A conditional statement fixes that, probably more cleanly than checking the string length repeatedly. Up to 125, I think--but then, I've already been clobbered.

    sub o { $q=join'',@{$_[0]}; sort {$a cmp$b?do{$i=0;{index($q,substr($a,$i,1))<=>index($q,substr($b +,$i,1))or($i++,redo)}}:0 }@{$_[1]} }

    If God had meant us to fly, he would *never* have give us the railroads.
        --Michael Flanders

      This is nice, but it has ahhh... difficulties with repeated words in the list to be sorted.
                     s aamecha.s a..a\u$&owag.print

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://79130]
Approved by root
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (6)
As of 2023-12-03 06:40 GMT
Find Nodes?
    Voting Booth?
    What's your preferred 'use VERSION' for new CPAN modules in 2023?

    Results (20 votes). Check out past polls.