When should I use a dispatch table?

All,
A typical dispatch table is a technique used to translate:

if    ($item eq 'thing1') { # ... }
elsif ($item eq 'thing2') { # ... }
elsif ($item eq 'thing3') { # ... }
elsif ($item eq 'thing4') { # ... }
else                      { # ... }
[download]

Into: exists $dispatch{$item} ? $dispatch{$item}->() : $dispatch{default}->();

In addition to opportunities for code reduction and clean maintainable code, dispatch tables can be a runtime optimization. The if/eslif chain is a linear scan O(N) while a hash is purported to be O(1). The hash does have the overhead of looking up the key, dereferencing the value, and executing the code ref. I assumed that in what only take a few equality checks to pay for this overhead.

In this node, I claimed that my version of the code was much more efficient. That node inspired madbombX to convert an if/eslif chain into a dispatch table and benchmark it. Unfortunately, the bench was flawed and the results skewed. This lead me to write code to generate a fair benchmark based off user input.

#!/usr/bin/perl
use strict;
use warnings;
use Getopt::Std;
use HTML::Template;
use List::Util;

my %opt;
get_args(\%opt);

my $template = HTML::Template->new(filename => 'bench.tmpl');
$template->param(CHAIN_LENGTH    => $opt{n});
$template->param(UNMATCHED_ITEMS => $opt{u});

my @chain = map {ITEM => $_}, 2 .. $opt{n};
$template->param(CHAIN => \@chain);

my $unmatched_items = int(($opt{u} / 100) * $opt{t});
$unmatched_items  ||= 1 if $opt{u};

my @input  = (-1) x $unmatched_items;
my $total  = $opt{t} - $unmatched_items;
my @weight = $opt{w}
    ? sort {$b <=> $a} grep $_, split /[%,]|\s+/, $opt{w}
    : (int(100/$opt{n})) x $opt{n};

for (1 .. $opt{n}) {
    my $amount = shift @weight;
    push @input, ($_) x (int(($amount/100) * $total) || 1);
}

$_ = {FIND => $_} for @input;
@input = List::Util::shuffle(@input) if ! $opt{o};
$template->param(BUILD_FIND => \@input);
print $template->output();

sub get_args {
    my $opt = shift @_;
    my $Usage = qq{Usage: $0 [options]
        -h : This help message

        -n : The (n)umber of items in the if/elsif chain
             Default: 5

        -t : The (t)total number of items to process
             This number need not be the same as your actual data.
             It should be sufficient to exercise the options
             Default: -n * 10

        -u : The percentage of (u)nmatched items to include
             Typical if/elsif chains end with a catch-all else
             This option allows you to determine how often this is use
+d
             Default: 0

        -o : Specifies if the if/elsif chain should be (o)rdered
             If the input data is well known, it is possible to order
             the if/elsif chains to ensure the most common items are f
+irst
             Default: 0 (off)

        -w : The (w)eight in percentages of the matched input
             Ignored unless -o option also specified
             There must be as many #% as there are -n to work correctl
+y
             The #% must add up to 100 to work correctly
             The -b option, if present, takes precedence before this o
+ption
             For instance, if you have n = 5, t = 100, -o, and u = 10
             -w"40%, 30%, 15%, 10%, 5%" would equate to 100% of the re
+maining 90%
             Default: N/A
    } . "\n";
    getopts('hn:t:u:w:o', $opt) or die $Usage;
    die $Usage                 if $opt->{h};
    delete $opt->{w}           if ! exists  $opt->{o};
    $opt->{n} = 5              if ! defined $opt->{n};
    $opt->{t} = $opt->{n} * 10 if ! defined $opt->{t};
    $opt->{u} = 0              if ! defined $opt->{u};
}
[download]

bench.tmpl

#!/usr/bin/perl
use strict;
use warnings;
use Benchmark 'cmpthese';

my %dispatch = map {$_ => sub {my $res = "I found it"}} 1 .. <TMPL_VAR
+ NAME="CHAIN_LENGTH">;
<TMPL_IF NAME="UNMATCHED_ITEMS">
$dispatch{default} = sub {my $res = "I did not find it"};
</TMPL_IF>

my @find;
<TMPL_LOOP NAME=BUILD_FIND>
push @find, <TMPL_VAR NAME=FIND>;
</TMPL_LOOP>

cmpthese(-10, {
    dispatch => sub {
        for (@find) {
            <TMPL_IF NAME="UNMATCHED_ITEMS">
            if (exists $dispatch{$_}) {
                $dispatch{$_}->();
            }
            else {
                $dispatch{default}->();
            }
            <TMPL_ELSE>
            $dispatch{$_}->();
            </TMPL_IF>
        }
    },
    if_else => sub {
        for (@find) {
            if ($_ eq '1') {
                my $res = "I found it";
            }
            <TMPL_LOOP NAME=CHAIN>
            elsif($_ eq '<TMPL_VAR NAME=ITEM>') {
                my $res = "I found it";
            }
            </TMPL_LOOP>
            <TMPL_IF NAME="UNMATCHED_ITEMS">
            else {
                my $res = "I did not find it";
            }
            </TMPL_IF>
        }
    },
});
[download]

The results were quite amazing. No matter how I tweaked the settings, with only 6 items in the if/elsif chain I could not make the dispatch table win. It is still possible that the other optimizations I made to the code in that node made it more efficient but without a representative sample of the real input - I can't be sure.

The breakpoint without unmatched items is about 12 on my system. There are plenty of user defineable variables so perhaps you might want to ~~question code you have that uses dispatch tables~~ play around. While the code reduction and maintainability is still an advantage, the performance boost may not be all it is cracked up to be.

Update: Minor change to wording in last paragraph after realizing it may imply I was recommending not using dispatch tables because they were not as efficient as I assumed them to be originally.

See Also:

Implementing Dispatch Tables

Big-O Notation - What is it good for?

Cheers - L~R

Back to Meditations