Module Submission for CPAN

tedv has asked for the wisdom of the Perl Monks concerning the following question:

I was looking for an Anagram module on CPAN: Something that could read in a dictionary and use that to create simple anagrams of words with an anagram($word) function. I couldn't find anything, so I wrote my own module. I wanted to submit it to CPAN, but I've never done that before. My questions are two-fold:

#1: What should I do before submitting the module? I'd like to set up something that works with MakeMaker (someone suggested h2xs). What about general documentation? Clearly there is no end to the amount of support you could give a module, but what's a reasonable lower bound for supporting this hack?

#2: I've attached the module code and a short script to run it below. What is stylistically expected from a CPAN module? How could this code get cleaned up? Are there any algorithmic changes that would make the code run faster (or support more than 1 destination word?) (If you're curious, the code parses a dictionary into a standard tree format and then when anagraming the words, walks through the tree on a prefix-by-prefix basis. A few of my variable names suck. Let me know if you have better names than $sub_table and $sub_prefix.)

anagram:

#!/usr/bin/perl -w

use strict;

use Anagram;

while (<>) {
    chomp;
    my $word = $_;
    s/\W//g;
    my $base = lc;
    my @words = grep { $base ne $_ } Anagram::anagram($base);
    next unless @words;

    print "\nAnagrams of $_:\n";
    print "  $_\n" foreach @words;
}
[download]

Anagram.pm:

use strict;

use Carp;

package Anagram;

@Anagram::EXPORT = qw(anagram is_word starts_word);

BEGIN {
    $Anagram::version = "1.0";
    $Anagram::dict = {};

    open DICT, "</usr/dict/words" or croak("Could not open dictionary:
+ $!");
    foreach (<DICT>) {
        s/\W//g;
        my @letters = split //, lc;

        # Insert word into sorted dictionary tree
        my $sub_table = $Anagram::dict;
        foreach (@letters) {
            # Create sub-table if it's missing and then recurse
            $sub_table->{$_} = {} if ref($sub_table->{$_}) ne 'HASH';
            $sub_table = $sub_table->{$_};
        }

        # Flag this entry as a valid word
        $sub_table->{word} = 1;
    }
    close DICT;
}

sub version {
    my $version = shift;

    warn "Version $version is later than $Anagram::version."
      if defined($version) && ($version > $Anagram::version);

    return $Anagram::version;
}

sub anagram {
    my $word = shift;

    # Create table of letters.  This is better than a pure table
    # because we avoid extra work for duplicated letters-- we can
    # anagram "aaaaa" in one cycle instead of 5! = 120 cycles.
    my $letters = {};
    foreach (split //, $word) {
        $letters->{$_}++;
    }

    return permute_recurse($letters, "");
}

sub permute_recurse {
    my ($letters, $prefix) = @_;

    my @words = ();
    foreach (keys %$letters) {
        # Test if any words start with this new prefix
        my $sub_prefix = $prefix.$_;
        my $sub_table = starts_word($sub_prefix);
        next unless $sub_table;

        # Use this letter
        if ($letters->{$_} <= 1) {
            delete $letters->{$_};
        } else {
            $letters->{$_}--;
        }

        # Test for recurse case
        if (scalar keys %$letters) {
            push @words, permute_recurse($letters, $sub_prefix);
        }
        # Test for base case
        elsif ($sub_table->{word}) {
            push @words, $sub_prefix;
        }

        # Restore letter
        $letters->{$_}++;
    }

    return @words;
}

sub is_word {
    my $sub_table = walk_dict(@_) or return undef;
    return $sub_table->{word};
}

# Returns a sub-table entry on true and undef on failutre
sub starts_word {
    my ($word) = @_;

    my $sub_table = $Anagram::dict;
    foreach (split //, $word) {
        $sub_table = $sub_table->{$_};
        return undef unless ref($sub_table) eq 'HASH';
    }

    return $sub_table;
}

1;
[download]

-Ted

Back to Seekers of Perl Wisdom