http://qs321.pair.com?node_id=271724

Reason for Creation

In a current project we required a way of accessing data by state based on regions and sub regions (referred to as divisions) as described by the Census Bureau. I did a quick search on CPAN and didn't find a regions specific module, I found several that were State based however and leverage one for this module.

Initial Solution Requirements

  1. Region breakdown of states as outlined by the US Census
  2. Divisions within the Regions as outlined by the US Census
  3. Ability to return each of the above in a list, order not important
  4. Ability to find all states with in each of the above
  5. Have the state list returned in alphabetical order
  6. Have the state list returned in either "full name" or abbreviated form ( South Dakota -or- SD )

Module Code:

package Region; =pod Module purpose is to provide access to US regions, divisions, and stat +es grouped either by the above or individually. See inline comments for more information. =cut use strict; use Geography::States; my $debug = 0; my %regions; my $region; my $division; my $gs = Geography::States->new('USA'); while (<DATA>) { next if $_ !~ m/[a-zA-Z]/ || $_ =~ /^#/; if (m/^[A-Z]/ && m/[A-Z]$/) { $_ =~ s/\s+$//; print $_ , "\n" if $debug; $region = ucfirst(lc($_)); } elsif (m/^[A-Z]/ && m/[a-z]$/) { $_ =~ s/\s+$//; print "\t$_\n" if $debug; $division = $_; } elsif (m/^\s+\w/) { $_ =~ s/^\s+|\s+$//g; my $code = $gs->state($_); push @{ $regions{$region}{$division} } , { full => $_ , code => $c +ode } ; } } sub new { my $class = shift; my $self = {}; $self = \%regions; bless $self, $class; } =pod The regions method will return a list of regions in alphabetical order =cut sub regions { my $self = shift; return sort keys %{ $self }; } =pod The divisions method will return a list of all of the divisions within a region. It accepts a list of arguments, those items must be equal to the region names. When a list is passed only the divisions in the regions passed will be returned. =cut sub divisions { my ($self,@reg) = @_; my @list; if (!$reg[0]) { @reg = sort keys %{ $self }; } @reg = map { ucfirst(lc($_)) } @reg; foreach my $region (@reg) { foreach my $division (sort keys %{ $self->{$region} }) { push @list, $division; } } return @list; } =pod The state method will return an array of states, the contents of which are determined by arguments passed to the method. If no options (hash) is sent in then it will return a list of all the state codes in alphabetical order. The state name can be returned if key "name" has value of 'full' States for a region can be returned if an option of 'region' has been set to one of the available regions. States for a division can be returned if an option of 'division' has b +een set to one of the available divisions. The only mixing that can be done is State name type (full or code) alo +ng with division OR state. Sending both a region and division will only w +ork if the division selected is under the region selected. =cut sub state { my ($self,%args) = @_; my $verbiage = $args{name} || 'code'; my $region_ = lc($args{region}) || 'ALL'; my $division_ = lc($args{division}) || 'ALL'; my @list; foreach my $region (keys %{ $self }) { next if $region_ ne 'ALL' && lc($region) ne $region_; foreach my $division (keys %{ $self->{$region} }) { next if $division_ ne 'ALL' && lc($division) ne $division_; foreach my $state ( @{ $self->{$region}{$division} } ) { push @list , $state->{$verbiage}; } } } return sort(@list); } __DATA__ NORTHEAST Middle Atlantic New Jersey New York Pennsylvania New England Connecticut Maine Massachusetts New Hampshire Rhode Island Vermont MIDWEST East North Central Illinois Indiana Michigan Ohio Wisconsin West North Central Iowa Kansas Minnesota Missouri Nebraska North Dakota South Dakota SOUTH East South Central Alabama Kentucky Mississippi Tennessee South Atlantic Delaware District of Columbia Florida Georgia Maryland North Carolina South Carolina Virginia West Virginia West South Central Arkansas Louisiana Oklahoma Texas WEST Mountain Arizona Colorado Idaho Montana Nevada New Mexico Utah Wyoming Pacific Alaska California Hawaii Oregon Washington #POSSESSIONS # # Puerto Rico # Virgin Islands # Pacific Islands # # Pacific Islands Includes: Canton, Guam, Mariana, Marshall, Samoa, Wa +ke

Informal Test Code

#!/usr/bin/perl use Region; use strict; my $regions = Region->new(); print join("\n",$regions->regions); print "\n\n"; print join("\n",$regions->divisions); print "\n\n"; print join("\n",$regions->divisions('west')); print "\n\n"; print join("\n",$regions->state( name => 'full' , region => 'west' ) ) +; print "\n\n"; print join("\n",$regions->state( name => 'full' , division => 'East No +rth Central' ) ); print "\n\n"; print join("\n",$regions->state( name => 'code' , region => 'South' , +division => 'South Atlantic' ) ); print "\n\n";

Possible Module Names

Geography::US::Census::Regions
Geography::US::Regions::Census
Locale::US::Census::Regions
???

Interest/Comments

Is there any interest in this module for addition to CPAN or is there an existing module that I overlooked that already fills this space?


General comments on design and method interfaces would be appreciated even if you don't need or want the module.

UPDATED: Moved the while loop outside of the new to avoid issues if the user attempted to create multiple objects within a single script.

Removed the 'use Data::Dumper' that was left over from initial testing.

Replies are listed 'Best First'.
Re: RFC US Region Module
by Juerd (Abbot) on Jul 06, 2003 at 08:22 UTC

    If you're putting this on CPAN, please fix the POD. When the POD is converted to any other format, the code in between is lost, and you have a big piece of paragraphs, that currently have no headings to make the structure clear.

    Please read POD in 5 minutes and find out that POD is more than multi-line comments.

    Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

Re: RFC US Region Module
by Aristotle (Chancellor) on Jul 06, 2003 at 13:06 UTC

    I don't see why you made this module object oriented - you're not making any use of $self. Also, you're reading DATA in your new() method - why? The second and subsequent instantiation of an "object" in your class will not have anything to initialize. Pull that while loop out of the method and get rid of new (as well as $self in the other functions). Of course the function names are somewhat unfortunately named for those who'd want to import them; maybe prefix them with us_ or some such.

    As far as the name is concerned I'd definitely go with Geography::US::Census::Regions. Locale is the wrong namespace for this module. A module's "innermost" name portion should express what that module deals with; here, that is regions. It deals with those according to the census, hence Census::Regions. Regions::Census would mean it deals with the census according to the regions, which makes no sense.

    Makeshifts last the longest.

      If I leave it OO don't I avoid the issue of the function name collision you mention? While I agree there is no current or compelling reason for this module to be OO it does seem to work easily enough to provide the functionality I need. Would I see benefits from it being non-OO?

      The DATA read has been moved outside of the new since that was the wrong spot for it and subsequent object creations would have resulted in an empty hash ref being assigned to $self.

      Of the names I felt the Geography namespace would be best, but I am still struggling with the Census::Regions / Regions::Census issue for these reasons:
      1. Geography::US::Regions would leave the name space open for additional Region separation specific modules each named for their source, as in this case Census
      2. Geography::US::Census seems limiting to me because the number of Geography based concepts that would deal with Census seem limited.

      Thanks for the feedback.
        If I leave it OO don't I avoid the issue of the function name collision you mention?

        Perl uses packages to implement OO but packages don't necessarily imply OO. Packages provide namespaces that help avoid function and variable name collisions. Those functions don't necessarily have to be written in an OO fashion, though.

        90% of every Perl application is already written.
        dragonchild
        The collisions are already avoided by writing something like Geography::US::Census::Regions::state, which is admittedly clumsy. You can use alternative exporters such as Exporter::Tidy to provide your users the option to choose a prefix or alias of their preference for the exported functions. OO is the wrong tool here.

        Makeshifts last the longest.

Re: RFC US Region Module
by PodMaster (Abbot) on Jul 06, 2003 at 13:27 UTC
    In addition to what's already been said, I don't see you using any of Data::Dumper's functionality. Also, I think you should just inline the datastructures you're generating in sub "new". I see no point in reading from __DATA__ more than once if the data isn't changing.

    MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
    I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
    ** The third rule of perl club is a statement of fact: pod is sexy.

      The Data::Dumper was left over from initial debugging, I have removed it.

      Agreed on the __DATA__ issue. In fact you can't reread from __DATA__ without certain precautions. I have moved the while outside of the new method to avoid this issue. I do however prefer to keep the data in plain text rather then a data structure for reablility. Thanks for the feedback.
Re: RFC US Region Module
by Abigail-II (Bishop) on Jul 06, 2003 at 22:39 UTC
    Interesting module, but, IMO, a sucky interface. Objects aren't really useful, it's not that there are multiple subdivisions of the USA. I strongly suggest a tied hash (then your implementation can still benefit from an OO approch), but it makes it easier for the user, specially if (s)he wants to interpolate the results in a string.

    Abigail

Re: RFC US Region Module
by chunlou (Curate) on Jul 06, 2003 at 19:45 UTC
    A potential submodule could be Foo::Bar::GIS for exchanging data or even interfacing with other geo info systems, such as ArcInfo or GRASS. It would be nice to get spatial statistics from other specialized software rather than, say, implementing your own variogram in Perl. (Well, one (useless) use of this could be to see if the XP of monks have to do with where they live, since we already have XP and location stats for many monks--but of course you could as well more qualitatively spot the pattern by staring at a map... or you could use it in epidemiology.)

    It could be a bummer if you could group data by nice geographical name but were unable to easily compute certain statistics (which could happen to a general-purpose language); or you could compute statistics grouped by zip code but were unable to more easily convert all the zip codes to more human-friendly regional names (which could happen to a statistical software).
Re: RFC US Region Module
by IlyaM (Parson) on Jul 07, 2003 at 18:05 UTC