Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.

The golf course looks great, my swing feels good, I like my chances (Part I)

by eyepopslikeamosquito (Archbishop)
on Apr 25, 2009 at 07:08 UTC ( [id://759963] : perlmeditation . print w/replies, xml ) Need Help??

Competing in a modern codegolf tournament can be a lonely experience. You play alone. You analyze alone. After an occasional breakthrough, you celebrate alone, your hard-won solution a closely guarded secret. There is no nineteenth hole, no place to share your solutions with your buddies over a beer.

Accordingly, and given the codegolf Roman-to-Decimal course has been open for more than two years now, I've decided to share my solutions to that game here. You'll get to look over my shoulder as I describe the circumstances and thought processes behind these solutions. I hope you'll find my journey through this competition both interesting and instructive, especially so for the serious, aspiring golfer.

Rules of the Game

In this game, you must convert a Roman numeral to its integer value. The input numeral, provided on stdin followed by a single newline, is guaranteed to be in the range I to MMMCMXCIX (1 to 3999). For example, given IV on stdin, your program should print 4 to stdout.

Test Driven Golf

I always like to start a golf game by writing a comprehensive test program. That saves time in the long run, enabling you to easily verify the many and varied solutions you must concoct to be competitive at golf. So, just like any journeyman test-first developer, I began this game by writing such a test program. A little searching on the CPAN quickly uncovered the Roman module, which I used to write the following test program:

use strict; use warnings; use Roman; my $testprog = shift or die "usage: $0 program-file\n"; my $datafile = 'tt.txt'; my $cmd = "perl $testprog <$datafile"; sub build_file { my $contents = shift; open(my $fh, '>', $datafile) or die "open '$datafile': $!"; print $fh "$contents\n"; close($fh); } print "Testing $testprog, size=", -s $testprog, " bytes.\n"; for my $i (1..3999) { my $r = uc(roman($i)); print "i=$i r=$r\n"; build_file($r); my $got = `$cmd`; chomp($got); $got eq $i or die "expected '$i' got '$got'\n"; } print "successful\n";

Since this tests all valid input values (1..3999), any solution that passes this test program should be correct. Unfortunately, there are usually holes (no pun intended) in the test programs used by the codegolf web site because, in their fully automated flavour of golf, the test program must complete in less than four seconds, so it's usually not feasible to exhaustively test all possible inputs. Their test program for this game, however, is a pretty good one because it includes test cases for all six of the nastier edge cases (IV, IX, XL, XC, CD, CM), plus eight random ones, making it improbable for an incorrect solution to be accepted.

Getting Started

In any golf game, I try to get a complete working solution as quickly as possible. This method of golfing boosts morale in that you always have a working solution -- and enjoy frequent positive feedback each time you shorten it.

I'd just finished playing the Fonality Golf Challenge, so I began this game simply by inverting Ton's winning HART magic formula from the Fonality game:

s!.!y$IVCXL426(-:$XLMCDIVX$dfor$$_.=5x$&*8%29628;${"$$_ "}=-$_!egfor-4e3..0;print${<>}
86 strokes. Hmmm. The leading score posted at this early stage was a remarkable 60 strokes by perl internals hacker robin. This hot start made it immediately clear that Ton's astonishing and magical decimal-to-roman formula was going nowhere in this new game, where you must convert the other way. That was a relief to me because I didn't relish a repeat of the ill feeling surrounding the Perl Golf Ethics thread.

Since I'd already downloaded the CPAN Roman module, I next took a look at the algorithm it employed, namely:

sub arabic { my($arg) = shift; isroman $arg or return undef; my($last_digit) = 1000; my($arabic); foreach (split(//, uc $arg)) { my($digit) = $roman2arabic{$_}; $arabic -= 2 * $last_digit if $last_digit < $digit; $arabic += ($last_digit = $digit); } $arabic; }
Though obviously not golfed, I was impressed by this function, especially this line:
$arabic -= 2 * $last_digit if $last_digit < $digit;
because this clever trick enables the algorithm to make one pass through the input string, one character at a time.

Nowadays we have a principle in Perl, and we stole the phrase Huffman coding for it, from the bit encoding system where you have different sizes for characters. Common characters are encoded in a fewer number of bits, and rarer characters are encoded in more bits. We stole that idea as a general principle for Perl, for things that are commonly used, or when you have to type them very often -- the common things need to be shorter or more succinct.

-- Larry Wall

My prior golfing experience told me that any algorithm that essentially just traverses a string one character at a time is usually very competitive. And this very common operation of traversing a string one character at a time should be golf-able in just about any language -- assuming the language designers Huffman-coded common operations, as Larry Wall did.

The primary tactical problems to be solved with this algorithm are how to shorten the code surrounding $last_digit and how to perform the roman2arabic lookup.

Regexes always win (Mtv's law of golf)

Golfing this promising algorithm proved straightforward once I remembered Mtv's law and used a regex with side-effects ($') for the lookup table:

Notice that this solution uses the well-known golfing trick, invented by Dutch golfing maestro Eugene van der Pijll, of using $\ as the accumulator, allowing you to print it with a bald print, rather than print$a say, thus saving two strokes.

Getting to 68 with very little effort made this first approach look very promising. Indeed, this simple algorithm formed the basis of the winning solutions in all four languages (Perl, Python, Ruby, PHP), albeit with many twisty tactical turns along the way.

An Interesting Diversion

How many different algorithms did you try? Me, I tried it four or five different ways until I stumbled upon one that seemed shorter

-- Expert golfer Jasper offers sound advice to novice golfer petdance in Re^9: Perl Golf Ethics

In golf, it's desirable to have multiple irons in the fire; that is, to have multiple working solutions using different approaches. Why? Because it's often possible to produce a solution shorter than all of them by combining the best ideas from the various approaches. A related lesson I took from this game is the value of writing solutions in all four languages (Perl, Python, Ruby, PHP); the act of doing that often uncovered algorithmic and tactical improvements that could be transferred from one language to another.

So, to broaden possible solutions, I played around with a tr (aka y) transform reminiscent of the Fonality game:

#!perl -lp s#\d#!($\+=$.*$&*(2cmp$'))#eg,$..=0while y/MDCLXVI/CLXVI51/
70 strokes. But I had to work very hard. I spent far too long fiddling with this approach because I found it fascinating. My golfing instincts were telling me, however, that this approach was far less promising than the previous simpler one, so I reluctantly gave it up at this early stage and never came back to it.

Since symbolic references were so effective in the Fonality game, I naturally tried a similar approach early in this game. Using xor, a common golfing technique, the required $I, $V, $X, $L, $C, $D, $M variables can be built fairly easily, as illustrated by the following test program,

#!perl -n ++$I;$$_=$.*=$^F^=7for V,X,L,C,D,M; print"I=$I V=$V X=$X L=$L C=$C D=$D M=$M\n";
Running echo hello|perl displays:
I=1 V=5 X=10 L=50 C=100 D=500 M=1000
So far, so good. However, I could find no remotely competitive solution using this approach. For example, though this one does work:
#!perl -p ++$I;$$_=$.*=$^F^=7for VXLCDM=~/(.)/g; s!!($$1<${_&$'}?'-':'+').$$1!eg;$_=eval
it's 87 strokes in length. Ouch. So I also abandoned this interesting approach early in this game. This shows how fickle golf can be: a winning approach in the Fonality game proved to be a complete dud in this very similar one.

Update: Much later I found an improved 74 stroke symref-based solution:

$b=++$I;$$_=$b*=$^F^=7for V,X,L,C,D,M; $\+=$n-$\%$n*2while$n=${+getc};print

Functional approaches never win

Much as I enjoy functional programming, this style rarely wins golf tournaments. For fun, I tried a couple of functional solutions, such as:

use List::Util 'reduce'; print reduce{$a+$b*($b+1<=>$a)}map{/./;$|--&&(M999D499C99L49X9V4I=~$&+ +$')*y///c}reverse<>=~/((.)\2*)/g
Though there are surely shorter functional-style solutions (update: see, for example, this one found much later), I quickly became discouraged and gave up on this approach quite early.

Into the Top Ten

Of course. The true&tested method of "Can't possibly work, let's try it anyway."

-- Eugene van der Pijll during Infix to RPN game

This classic Eugene van der Pijll quote is the single best piece of advice I could give to any aspiring golfer. Remembering Eugene's aphorism helped me time and time again during this game, especially later when I started golfing in languages I hardly knew (Python, Ruby and PHP).

Reverting to my original 68 stroke solution:

that $} "previous value" variable was starting to get really annoying. It would be nice to get rid of it somehow. But how? I wonder if it's possible to exploit perl's evaluation order somehow. Well, follow Eugene's advice and just try it!
Yes! No more $}. Down to 64 and on the leaderboard now! Woo hoo! At this point, I allowed myself a few punches in the air with my fist and even jumped up from my desk and ran madly around the room celebrating like a soccer player who just kicked a goal. As for why this solution works, I'll leave that as an exercise for the keen student of perl internals.

The top ten, after four weeks of play:

1st 60 robin Perl 2nd 61 arpad Perl 3rd 63 bearstearns Perl 4th 64 kounoike Perl 5th 64 eyepopslikeamosquito Perl 6th 73 flagitious Ruby 7th 73 yvl Ruby 8th 73 bitsweat Ruby 9th 73 jojo Perl 10th 75 shinh Perl

After the euphoria had died down, I realized I was now completely stuck.

What's the rush?

Why is everyone in such a rush?

-- Peter Norvig in Teach Yourself Programming in Ten Years

"Learning Perl in 24 Minutes Unleashed, in a Nutshell for Dummies" is the one I have

-- f00li5h cited at Humour page

My head is hurting, my right eye feels like it's going to pop like a mosquito drinking from an expresso addict with high blood pressure, I want to crawl somewhere damp and dark and quiet and I consider never to touch a keyboard again.

-- `/anick, after hacking all night on the last day of the TPR 02 Perl Golf tournament

Then it dawned on me: what's the rush? With this new form of golf, there's no need to risk golf-induced eye damage, as `/anick unfortunately suffered, in a desperate attempt to shave one more stroke before the game closes. You can simply, Zen-like, just keep chipping away at your solution at your own pace. So I just kept whittling away whenever I had some spare time.

Some months later, I stumbled upon an unusual <=> spaceship operator solution:

Notice that, unlike earlier algorithms, this one requires one more iteration than the string length. For this purpose, I utilized the trailing newline, simply changing <>=~/./g to <>=~/./sg initially, at the cost of a single stroke. Later, I got that stroke back by noticing that $/=\1; combined with <> is one stroke shorter.

62 strokes! Only three strokes from the lead now, with "bearstearns" the new leader on 59.

After further months drifted by, I whittled two more strokes:

60 strokes! I felt a bit embarrassed at not finding this simple trick sooner. Not to worry, no rush. Only one from the lead and getting excited now.

For some time now I was aware that changing the table lookup string from:

saved three strokes -- if you could combine it with an expression you had to perform anyway to add one to $' (as was achieved in Re^2: Golf: Magic Formula for Roman Numerals). The trouble is, I could not find an economical way to add one to the shortened lookup string in this game. The best I found is the following 60 stroker:

As indicated at Golf: Magic Formula for Roman Numerals, I was also playing around with replacing the lookup table with a magic formula. At first, I struggled to fit the magic formulae I had found into a short solution to this course, but eventually wound up with a flood of 60 strokers, such as:

$\+=$z-2*$z%($z=1 .E.(3^77%ord)%7>>y/VLD//)for<>=~/./g;print $\+=$z-2*$z%($z=.5**y/VLD//.E.(3^77%ord)%7)for<>=~/./g;print $\+=$z-2*$z%($z=5**y/VLD//.E.(42^88*ord)%5)for<>=~/./g;print $\+=$z-2*$z%($z=5**y/VLD//.E.71248%ord()%5)for<>=~/./g;print $\+=$z-2*$z%($z=5**y/VLD//.E.(3&57532/ord))for<>=~/./g;print $\+=$z-2*$z%($z=.1.E.(3^85%ord)%7>>y/VLD//)for<>=~/./g;print
I now had many different 60 stroke solutions, but the lead was at 59!

Never give up

But it's not over then ... I pretty soon found both the basic 33-solutions by permuting ways to do that a bit ... In fact, there's an almost 32 solution ... which fails on a stderr message and does not work on arguments that start with a number and contain a letter. But it has to be tried at least. How many of the 33 people did?

-- Golfing legend Ton Hospel showing why he's No. 1; he's always searching for improvements

Alas, I couldn't improve any of these solutions and I've still got no idea what the leading golfers on 59 were doing.

In utter frustration, I took off for a long two hour walk to clear my head. After about an hour of walking, in desperation, I tried a completely different approach. I remember walking along calculating its golf score in my head, again and again, and kept coming up with 58! Surely, it couldn't be. Usually, when I have these "epiphanies", there's something I've overlooked and the solution fails dismally when I actually test it. With trembling fingers, I typed in the 58 stroke solution:

1, 2, 3, ... 99, 100, ... 3999. Success, all tests pass! Time for another celebration.

For the first time in my life, I'm leading a Perl golf tournament! This solution, requiring no magic formulae at all, is not that hard to find and I remember feeling quite surprised at the time that noone else had found it after more than a year of play.

Magic formulae always win (update to Mtv's law)

When I started writing Python, Ruby and PHP solutions to this problem, I fell back to the magic formula approach, mainly due to my ignorance of general golfing tricks in these languages. Curiously, I was then able to apply the tricks learnt there to improve the Perl magic formula based solutions and so beat the 58 stroke regex-based solution described above.

In the next installment, I'll describe how I golfed this problem in Python and Ruby.


References Added Later

Updated 26-apr 2009: Minor improvements to wording. March 2016: Added Larry Wall quote re Huffman coding.

Replies are listed 'Best First'.
Re: The golf course looks great, my swing feels good, I like my chances (Part I)
by wazoox (Prior) on Apr 25, 2009 at 12:47 UTC
    Man these are astounding feats of craft. Kudos. I'm eager to read the second part...
      Ditto ... and then some !!

      A user level that continues to overstate my experience :-))
Re: The golf course looks great, my swing feels good, I like my chances (Part I)
by Burak (Chaplain) on Apr 26, 2009 at 16:27 UTC
    So, can we expect to see lots of people with 58-char answers in the Roman to Decimal list after this? :)

    But congrats on this and thanks for the detailed article you have here. I've tried some challenges on codegolf and my best was to become 80th in one of them. This golfing stuff really hurts my eyes, but it's fun to play :)

      So, can we expect to see lots of people with 58-char answers in the Roman to Decimal list after this? :)
      53 strokes after the next article. :) It'll be interesting to see what effect this has on the game. I might add there's a good chance I'll lose the lead if one of the leading golfers combines my solutions with his/hers to produce a shorter one. Based on the "99 bottles of beer" game though, I don't expect many folks to post my solutions under their name. After all, in that game, you can easily google for a 174-char solution (not posted by me) for 10th place on the leaderboard yet there are around 900 golfers with scores worse than that. Having said that, I just noticed there are twelve golfers on 174, which does seem rather a lot. :)

      FWIW, here are my reasons for publishing my solutions now:

      • In the code golf forums, I gave warning over a month ago that I'd like to publish my solutions and asked for advice (see the last post in this long "Close Challenges" thread) but received no reply.
      • Most folks in this "Close Challenges" thread were in favour of closing challenges after a period of time. The site's founder (carldr) agreed in principle with the idea of "seasons" and closing the games and publishing all solutions at the end of season. I'm guessing that nothing happened because the site's owners didn't have the time to spare to implement the "seasons" idea (which I can sympathize with BTW; I don't want to spend too much time on golf anymore).
      • I think waiting over two years has shown considerable restraint. :) I've wanted to publish for ages, but waited out of respect for the other golfers. Notice I have not published my (currently leading) solutions to the Saving Time game because it is "only" 250 days old. I do plan on publishing these though (update: they are now published here).
      • The codegolf site is still on perl 5.8.8 yet states "All languages will be kept relatively up to date as new versions are released". They probably should upgrade to perl 5.10.x, yet if they do that say versus print and other issues will make the current games unfair.
      • I prefer open source to closed source. :)
      • There has been precious little written about the experience of competing in a code golf tournament. Hopefully, this node will provide entertainment for experienced code golfers (and help them improve their game) while providing insight to others about the fun of competing at golf and also serve as a warning re the danger of spending more time than you can afford on the golf course.
      • I'm ready to move on from golf to other projects. Before I do that, I'd like to write down my golfing experiences while they're still fresh in my mind.
      • The main reason though is that I put a lot of effort into this and I'd like to share with others while I still can. I'm getting old and my health is not the best, you see.

        The codegolf site is still on perl 5.8.8 yet states "All languages will be kept relatively up to date as new versions are released". They probably should upgrade to perl 5.10.x, yet if they do that say versus print and other issues will make the current games unfair.

        Even if they change to 5.10.x, you need an explicit -M5.01 or -E: option to be able to use say, wouldn't you? The -e option is implicit so you can't just replace it to -E.

        Also it might be better to keep the older versions of languages as alternatives there too, unless that takes too much effort, so as not to break solutions like that in The golf course looks great, my swing feels good, I like my chances (Part II) by replacing ruby 1.8 with 1.9.

        I do agree that it would help to upgrade the languages though, in some other cases it can definitely help cut characters from solutions.

Re: The golf course looks great, my swing feels good, I like my chances (Part I)
by eyepopslikeamosquito (Archbishop) on Jun 04, 2009 at 08:50 UTC

    After pondering my improved Python algorithm for a while, I was surprised to shave two further strokes from my previous best legitimate Perl solution with this 56 stroker:

    Notice that this solution is a strange hybrid of regex and magic formula -- and so perhaps proves both Mtv's law of golf ("regexes always win") and eyepopslikeamosquito's law ("magic formulae always win"). :) Also unusual is that while trumps for on this occasion. The complete list of new Perl solutions I've found since my original post are:
    $\+=($n="1E@-"%9995)-$\%$n*2while$/=\1,IXCMVLD=~<>;print $\+=$n-2*$n%($n="1E@-"%9995)while$/=\1,IXCMVLD=~<>;print $\+=$n-2*$\%($n="1E@-"%9995)while$/=\1,IXCMVLD=~<>;print $\+=($n=IXCMVLD=~$_*"1E@-"%9995)-$\%$n*2for<>=~/./g;print $\+=$'-$\%$'*2while$/=\1,I1V5X10L50C100D500M1e3=~<>;print $\+=I1V5X10L50C100D500M1e3=~$_*$'-$\%$'*2for<>=~/./g;print $\+=($n=(IXCMVLD=~$_."E@-")%9995)-$\%$n*2for<>=~/./g;print $\+=($n=VLD=~$_*5+IXCM=~$_."E@-")-$\%$n*2for<>=~/./g;print $\+=M999D499C99L49X9V4I=~$_+$'-2*$\%($'+1)for<>=~/./g;print $\+=($n=10**index(IXCMVLD,$_)%9995)-$\%$n*2for<>=~/./g;print $\+=s//IXCMVLD=~$&."E@-%9995"/ee*$_-$\%$_*2for<>=~/./g;print D6L5V4M3C2X1=~$_,$\+=($n="1E$'"%9995)-$\%$n*2for<>=~/./g;print $\+=$n-2*$n%($n=uppp&7**(9671487%ord()/15)x1)for<>=~/./g;print # update: some more after finding getc $\+=($n=1+substr'004999',"@-",3)-$\%$n*2while VLDMCXI=~getc;print $\+=$'-$\%$'*2while I1V5X10L50C100D500M1e3=~getc;print $\+=($n="1E@-"%9995)-$\%$n*2while IXCMVLD=~getc;print

    Update: An improved 74 stroke symref-based solution, after "finding" getc (see below):

    $b=++$I;$$_=$b*=$^F^=7for V,X,L,C,D,M; $\+=$n-$\%$n*2while$n=${+getc};print
    And here's a quirky 83-stroke PHP version:
    <?for($M=2*$D=5*$C=2*$L=5*$X=2*$V=5*$I=1;$n=${fgetc(STDIN)};$t+=$n-2*$ +t%$n)?><?=$t;

      I was relaxing the other day, browsing through some old golfs at shinh's golf site, when I noticed ySas' solution to the "add_some_brainf..._code" game called the getc function. I thought, "that's odd-looking Ruby code with all those $ characters in it", when it dawned on me that it was not Ruby code at all, but Perl! Perl has a getc function? Really? I'm deeply embarrassed to admit, despite having used Perl heavily for 10 years, I did not know that. After picking myself up off the floor, I realised I'd made an appalling oversight in that getc could shorten my previous best Roman Perl solution like so:

      $\+=($n="1E@-"%9995)-$\%$n*2while IXCMVLD=~getc;print
      53 strokes! What I find interesting here is that while is two strokes longer than for, and getc two strokes longer than <>, yet combining these two longer constructs produces a solution four strokes shorter than:

      During the opening months of this competition, Perl led Ruby by 60 strokes to 73. At that stage, I thought it "impossible" for Ruby to ever catch Perl. To my great surprise, I whittled away at my Ruby solution until (to my dismay) it overtook my shortest Perl solution. After more than two years of play, the situation was reversed, with Ruby leading Perl by 53 to 58. Now I was sure that Perl had no chance of ever catching Ruby. As you might expect then, I'm delighted that the shortest known Perl and Ruby solutions are now tied at 53 strokes:

      $\+=($n="1E@-"%9995)-$\%$n*2while IXCMVLD=~getc;print n=1;$.+=n/2-n%n=10**(494254%C/9)%4999while C=getc;p$.
      And they both use the getc function. :-)

      Update: a couple of Ruby equivalents of the Perl solution:

      t=0;t+=(n=10**I%9995)-t%n*2while I="IXCMVLD"=~/#{getc.chr}/;p t t=0;t+=(n=10**I%9995)-t%n*2while I="IXCMVLD".index(getc);p t

Re: The golf course looks great, my swing feels good, I like my chances (Part I)
by robin (Chaplain) on Dec 20, 2009 at 22:48 UTC

    In case you're curious as to what my early-lead 60-stroke solution looked like, here it is:

    y/DCLXVIM /4 DCLX9/,$\=s/( )?\d/9x($&+!$1)/egfor(<>)x4;print

    This is quite a different algorithm from yours, I think.

    I enjoyed the few days I was thinking about this challenge, but I don't have the tenacity to keep whittling away month after month like you serious golfers.

      I just noticed a very similar Roman numeral golf recently concluded at Adjusting ySas' winning Perl solution to that game for this one produces this 58 stroker:

      which has a similar form to robin's early-lead 60 stroker above -- though it uses a different algorithm. Both these solutions are very beautiful, very Perlish, and quite astonishing (at least to me).

      I especially enjoyed ySas' ingenious use of @- above. I've never seen @- used like that in golf before and am tempted to give it a name, "ySas' device". By way of explanation, note that adding parentheses around (M) above adds one more element to the @- array when the (parenthesized) M matches. That is, the number of items in @- is used to differentiate between a matching M (two items in @-) and a matching D (one item in @-); this fits like a glove here because one and two happen to be the required multipliers.

      Update: By applying ideas from ySas' solution, we can reduce Robin's original 60-stroker:

      y/DCLXVIM /4 DCLX9/,$\=s/( )?\d/9x($&+!$1)/egfor(<>)x4;print
      to 58:

      Dazzling. Delightful. Intricate. That is, by far, the most beautiful of the many Roman to Decimal solutions presented in this node IMHO. Paraphrasing Abigail:

      Your golf solution is like exploring a large medieval castle, surrounded by a dark, mysterious forest, with something new and unexpected around each corner. There are dragons to be conquered, maidens to be rescued, and holy grails to be quested for. Lots of fun. By comparison, my golf solutions look like a Louis-XVI castle and garden to me. Straight, symmetric, and bright. There are wigs to be powdered, minuets to be danced, all quite boring.

Re: The golf course looks great, my swing feels good, I like my chances (Part I)
by Jasper (Chaplain) on May 10, 2009 at 08:29 UTC

    The comprehensive test suite is foolish :)

    One of the holes I'm doing well in is purely because the codegolf testsuite doesn't cover an edge case (which I saved one or two to avoid).

    If you see an entry with a huge number of fails, I think you can assume they've been repeatedly submitting the same, failing solution until it passes on some random test allocation.

    This was terrifically interesting, Andrew, thanks!

    edit: of course you'd already noticed all this! Why am I not surprised