Rosetta PGA-TRAM

After my recent "discovery" of the PGA-TRAM algorithm, I couldn't resist coding it up in a variety of languages. I'm doing this as a rosetta code node because I find it fun; hopefully, it'll provide some interesting insights into a variety of programming languages. Please note that this meditation is not about golf, but about implementing an arbitrary, specific algorithm in the most "natural" way in a variety of languages.

Here's the spec:

Write a function, roman_to_dec, to convert a "modern" Roman Numeral to its decimal equivalent. The function takes a single string argument and returns an integer. The string argument may be assumed to always contain a valid Roman Numeral in the range 1-3999. Error handling is not required. For example, roman_to_dec("XLII") should return 42.

Perl

Let's get started with a Perl version:

use strict;
use warnings;
use List::Util qw(reduce);

{
  my %rtoa = ( M=>1000, D=>500, C=>100, L=>50, X=>10, V=>5, I=>1 );
  sub roman_to_dec {
    reduce { $a+$b-$a%$b*2 } map { $rtoa{$_} } split//, uc(shift)
  }
}

my @testdata = ( "XLII", "LXIX", "mi" );
for my $r (@testdata) {
  print "$r: ", roman_to_dec($r), "\n";
}
[download]

Running this program produces:

XLII: 42
LXIX: 69
mi: 1001
[download]

To me, this is the most natural way to express the PGA-TRAM algorithm in Perl. It uses a variety of common Perl idioms:

A hash (rtoa) for the lookup table.
Lexical scope to "data hide" the rtoa hash (update: Persistent Private Variables using state are a better way to achieve this for Perl 5.10+, as demonstrated in the replies from moritz and BrowserUk).
A Perlish functional "pipeline": uc -> split -> map -> reduce.

Please feel free to disagree with my assessment and respond with your own preferred Perl solution.

For those folks unfamiliar or uncomfortable with reduce, note that it can be eliminated like this:

  sub roman_to_dec {
    my $t = 0;
    for my $n ( map { $rtoa{$_} } split//, uc(shift) ) {
      $t += $n-$t%$n*2;
    }
    return $t;
  }
[download]

Though I don't mind that at all, I do prefer the reduce version.

Python

As you might expect, my "most natural" Python 2 solution looks very similar to the Perl one above:

def roman_to_dec(r, __rtoa = dict(M=1000, D=500, C=100, L=50, X=10, V=
+5, I=1) ):
  return reduce( lambda t,n: t+n-t%n*2, (__rtoa[c] for c in r.upper())
+ )

testdata = [ "XLII", "LXIX", "mi" ]
for r in testdata:
  print r, ":", roman_to_dec(r)
[download]

That this solution is essentially identical to the Perl one shows how similar these two languages are. Indeed, I'm fond of claiming that Perl, Python and Ruby are "essentially equivalent". :) Since Python lacks Perl's simple lexical scoping mechanism, I chose to "data hide" the rtoa hash by making it a default function argument. Note that Python default function arguments are initialized once only. Though I could have employed the Python map function (with a lambda), I chose instead to use a lazily evaluated Python generator list comprehension, (__rtoa[c] for c in r.upper()), because that seemed more "naturally Pythonic" to me.

Update: a Python 3 version (just had to import reduce and use parens with print):

from functools import reduce
def roman_to_dec(r, __rtoa = dict(M=1000, D=500, C=100, L=50, X=10, V=
+5, I=1) ):
  return reduce( lambda t,n: t+n-t%n*2, (__rtoa[c] for c in r.upper())
+ )

testdata = [ "XLII", "LXIX", "mi" ]
for r in testdata:
  print( r, ":", roman_to_dec(r) )
[download]

Plus some much later (2020) discussion of Perl/Python solutions: Re^5: A short whishlist of Perl5 improvements leaping to Perl7

Haskell

Here's my Haskell solution, using GHC:

{-# OPTIONS_GHC -fglasgow-exts -Wall #-}

import Data.Char (toUpper)
import Data.List (concat, intersperse)

rtoa :: Char -> Int
rtoa 'M' = 1000
rtoa 'D' =  500
rtoa 'C' =  100
rtoa 'L' =   50
rtoa 'X' =   10
rtoa 'V' =    5
rtoa 'I' =    1
rtoa r   = error $ "Invalid rtoa char:" ++ show r

urtoa :: Char -> Int
urtoa = rtoa . toUpper

roman_to_dec :: String -> Int
roman_to_dec = foldl1 (\t n -> t+n-t`mod`n*2) . map urtoa

myshow :: (String -> Int) -> String -> String
myshow fn val = val ++ ": " ++ (show (fn val))

testdata :: [String]
testdata = [ "XLII", "LXIX", "mi" ]

main :: IO ()
main = do
  putStrLn $ (concat $ intersperse "\n" (map (\c -> (myshow roman_to_d
+ec c)) testdata))
[download]

Instead of using a rtoa hash, natural in Perl and Python, a rtoa function using simple pattern matching seemed the most Haskelly way of expressing this algorithm.

C++

Finally, my ANSI C++ solution:

#include <cctype>
#include <iostream>
#include <string>
#include <vector>
#include <numeric>

using namespace std;

const int romtab[] = {
  0,0,0,0,0,0, 0,   0,  0,0,  // 00-09
  0,0,0,0,0,0, 0,   0,  0,0,  // 10-19
  0,0,0,0,0,0, 0,   0,  0,0,  // 20-29
  0,0,0,0,0,0, 0,   0,  0,0,  // 30-39
  0,0,0,0,0,0, 0,   0,  0,0,  // 40-49
  0,0,0,0,0,0, 0,   0,  0,0,  // 50-59
  0,0,0,0,0,0, 0, 100,500,0,  // 60-69 (C:67,D:68)
  0,0,0,1,0,0,50,1000,  0,0,  // 70-79 (I:73,L:76,M:77)
  0,0,0,0,0,0, 5,   0, 10,0   // 80-89 (V:86,X:88)
};

inline int rtoa(int c)
{
  return romtab[c];
}

inline int accfn(int t, char c)
{
  return t+rtoa(toupper(c))-t%rtoa(toupper(c))*2;
}

int roman_to_dec(const string& s)
{
  return accumulate(s.begin(), s.end(), 0, accfn);
}

int main(int argc, char* argv[])
{
  vector<string> testdata;
  testdata.push_back("XLII");
  testdata.push_back("LXIX");
  testdata.push_back("mi");
  for (vector<string>::const_iterator iter  = testdata.begin();
                                      iter != testdata.end();
                                      ++iter) {
     cout << *iter << ": " << roman_to_dec(*iter) << '\n';
  }
  return 0;
}
[download]

The Perl reduce function is called "accumulate" in ANSI C++. Implementing the lookup with the admittedly long-winded romtab[] table felt like natural C++ to me because it seeks high performance. Indeed, if you peek inside <ctype.h> you will likely see toupper() implemented in a similar vein. This makes the rtoa(toupper(c)) above ridiculously fast, costing only two (constant) array lookups, not requiring even a single C function call! Also of note in this solution is that I didn't bother with map, instead simply applying the ultra fast rtoa(toupper(c)) combo twice in the accumulator function.

Update: As indicated below, we can eliminate the call toupper(), while eliminating any memory faults on invalid input, by adjusting romtab as shown below:

// Note: there are less than 256 initializers in this table;
// the uninitialized ones are guaranteed by ANSI C to be
// initialized to zero.
const int romtab[256] = {
    0,0,0,0,0,0,   0,   0,   0,   0,  //  00- 09
    0,0,0,0,0,0,   0,   0,   0,   0,  //  10- 19
    0,0,0,0,0,0,   0,   0,   0,   0,  //  20- 29
    0,0,0,0,0,0,   0,   0,   0,   0,  //  30- 39
    0,0,0,0,0,0,   0,   0,   0,   0,  //  40- 49
    0,0,0,0,0,0,   0,   0,   0,   0,  //  50- 59
    0,0,0,0,0,0,   0, 100, 500,   0,  //  60- 69
    0,0,0,1,0,0,  50,1000,   0,   0,  //  70- 79
    0,0,0,0,0,0,   5,   0,  10,   0,  //  80- 89
    0,0,0,0,0,0,   0,   0,   0, 100,  //  90- 99
  500,0,0,0,0,1,   0,   0,  50,1000,  // 100-109
    0,0,0,0,0,0,   0,   0,   5,   0,  // 110-119
   10,0,0,0,0,0,   0,   0,   0,   0   // 120-129
};

// Return the arabic number for a roman letter c.
// Return zero if the roman letter c is invalid.
inline int urtoa(int c)
{
  return romtab[c];
}

inline int accfn(int t, char c)
{
  return t+urtoa(c)-t%urtoa(c)*2;
}
[download]

I'd love to see some sample implementations in other languages, especially Perl 6. So please feel free to respond away. :)

Other Rosetta Code Nodes

See also

Roman CPAN module by CHORNY (its Roman.pm source code)

Back to Meditations