http://qs321.pair.com?node_id=2329
User since: Jan 22, 2000 at 13:47 UTC (24 years ago)
Last here: Oct 19, 2022 at 18:52 UTC (1 year ago)
Experience: 4411
Level:Priest (14)
Writeups: 286
CPAN ID:STEPHEN
Location:San Jose, CA
User's localtime: Mar 28, 2024 at 12:47 -08
Scratchpad: View
For this user:Search nodes
Watch for posts by this user

These days I work for NetApp as an Automation Architect. Previously, I worked at Yahoo! as a web tools developer.

In December 2004, I graduated from Carnegie-Mellon's West Coast campus at Moffett Field with a Master's in Software Engineering. CMU-West is a cutting-edge program (really!) that closely mimics an actual work environment. /msg me if you'd like to learn more about it.

On June 1st, 2002, a lovely woman named Christine Doyle and I got married.

I sing with the local a capella singing group Redshift.

I was an English major who found myself in programming through sheer accident. I've been programming Perl since 1995.


Refactorings

I'm gradually collecting refactorings-- recipes for improving code in small steps, without breaking existing code. See Refactoring for a general discussion of refactoring. This section is sort of a scratchpad; not all of these are as formally tested as I'd like. If you have refactorings you'd like to contribute, please feel free to send them here.

M.ost of the refactorings I collect here are Perl-specific refactorings-- problems that are most common in Perl, or for which Perl offers interesting solutions. For a list of general refactorings, see the Refactoring web site.

Unquote Variable

Condition: You have a scalar variable which is referenced in a double-quoted string without anything for it to interpolate with.

Resolution: Remove the double quotes.

Example

my $foo = 37; my $bar = "$foo";
is transformed to:
my $foo = 37; my $bar = $foo;

Steps

  1. Remove quotes from variable
  2. Test

Discussion
For various reasons, people new to Perl frequently think they need to quote variables to get their value. However, this is not necessary, and can be a real problem. Sure, $foo and "$foo" are the same thing... if $foo is a string scalar. But if $foo is a reference, then if you try to copy the reference with $bar = $foo, then you'll lose the reference-ness of the reference, and wind up with a string like "SCALAR(0x80fe52c)".

Replace Global With Subroutine

Condition: You have a global variable which is used in several different packages or files.

Resolution: Replace the global variable with a constant or subroutine that returns the same value.

Example

# In Foo.pm package Foo; $Global = 'bar'; use base 'Exporter'; @EXPORT = qw($Global); # In Bar.pm package Bar; use Foo; print "$Global\n";
is transformed to:
# In Foo.pm package Foo; use constant GLOBAL => 'bar'; use Exporter; @ISA = qw(Exporter); @EXPORT = qw(GLOBAL); # In Bar.pm package Bar; use Foo; print GLOBAL, "\n";

Steps

  1. Create subroutine that returns value of global
  2. Tie global to subroutine (with Devel::WarnGlobal, for example)
  3. Test
  4. Replace instances of reads of global with calls to subroutine
  5. Test
  6. Untie global from subroutine

Discussion
Global variables increase code complexity exponentially. Why? Because they encourage side effects. If I use $Global in Foo.pm, Bar.pm, and Baz.pm, and I accidentally change the value someplace in Bar.pm, I may not see the effects until that value change wreaks havoc someplace in Baz.pm. If something shouldn't vary, then it shouldn't be a variable.

Perl has a construct that's well-designed for constants, it's called the 'use constant' pragma. Accessing a constant is on average just as fast as, or faster than, accessing a variable.

Things to watch out for

Make Argument Passing Explicit

Condition: You have code that uses Perl's '&'-based default argument syntax.

Resolution: Make the arguments explicit.

Example

sub A { my ($apple, $banana) = @_; # ... &B; # ... } sub B { my ($apple, $banana) = @_; print "Apple: $apple Banana: $banana"; }
is transformed to:
sub A { my ($apple, $banana) = @_; # ... B($apple, $banana); # ... } sub B { my ($apple, $banana) = @_; print "Apple: $apple Banana: $banana"; }

Steps

  1. Give explicit arguments to subroutine.
  2. Test.

Discussion
Perl has a "feature" that can be incredibly confusing. When one is in subroutine A and calls subroutine B using '&' and without arguments (&B), it passes the arguments of A to B. This syntax is fine so long as no one changes the code, but once someone either changes the arguments to A or tries to move the call to B to a different subroutine, the program breaks.

Putting explicit arguments into B makes it clearer what B is doing, plus makes the code more resilient.

Frequently, having many default arguments is a sign of an object waiting to happen.

Unreinvent Wheel

Condition: Code duplicates functionality which is handled more robustly in a CPAN module.

Resolution: Replace the duplicated code with a call to CPAN code.

Example

print "Content-type: text/html\n\n"; @pairs = split(/&/, $ENV{QUERY_STRING}); for ($x = 0; $x < @pairs; $x++) { ($key, $value) = split(/=/, $pairs[$x]); $in{$key} = $value; } print <<END_PAGE; Your name is: $in{'name'}; END_PAGE
changes to:
use strict; use CGI ':cgi-lib'; ReadParse(); print <<END_PAGE; Your name is: $in{'name'}; END_PAGE

Steps

  1. Extract reinvented code to a single function or code block as necessary
  2. Test
  3. Replace reinvented code with CPAN code
  4. Test

Discussion
Imagine if the auto industry were engineered the same way as the software industry. (Standard joke: cars would be incredibly efficient, comfortable, and fold up easily to fit into a shirt pocket. Once a year they would explode without warning. We would somehow just accept this.) The design of each new car model would begin by trying out different ways of propelling the car. ("Well, what if we just put it on skis?" "No, I've got it-- limestone blocks! And we'll make the road curve up and down!") We'd be extremely lucky to see new car models once every ten years. Mechanics would have to be insane polymaths, able to repair both the steam-powered car running along on little legs and the kerosene-powered rocket sled.

Wheels are good. Wheels have been around for a while. We've debugged the wheels. Many people know how wheels work, and how to fix them when they go flat.

Same thing applies to CPAN modules. CPAN modules make your code smaller and hence easier to hold in your head. They're generally well documented. They're generally supported by the developer or by a mailing list. Even so, pretty much every big project out there has code which duplicates the functionality of an existing, well-tested, well-supported CPAN module. It's best to replace that code.

CPAN modules are generally more expansion-friendly than reinvented code, as well, since they're designed for more general cases. The classic example is CGI. See use CGI or die; for details.

CPAN modules are not necessarily perfect, though. There's plenty of alpha code in CPAN, and the code may not be well-tested for what you're trying to do with it. Look at the automated unit tests that come along with the module. If they don't cover the way you're trying to use the module, write your own tests. (It'd be an excellent idea to send them to the author for inclusion in future releases. That way, future releases of the module are less likely to break the functionality you need.) CPAN code should be as test-infected as the rest of your code. However, since CPAN code is a shared problem, it's likely to have reasonable tests defined already, making your job much easier.

In the particular, common case for the CGI code above, the end code is still not optimal. It's still using the old cgi-lib interface. One should apply "Update Interface" to move to the modern CGI interface. Although it's tempting to skip the compatibility interface and just charge into replacing everything with CGI::param calls, taking smaller steps reduces the risk that you'll have to abandon the refactoring.

Things to watch out for:

Update Interface

Condition: You have code that uses a deprecated interface to a module.

Resolution: Replace the deprecated interface with the more modern one.

Example:

use strict; use CGI ':cgi-lib'; ReadParse(); print "Your name is: $in{'name'}\n";
changes to:
use strict; use CGI qw/:standard/; print "Your name is: ", param('name'), "\n";

Steps:

  1. Update method/function calls to new interface.
  2. Test.
  3. If possible, remove old interface from import statement.
  4. Test.

Discussion
Sometimes, a module that has gone through some changes, or has been designed as a drop-in replacement for another module, will have a compatibility interface that mimics another module or another version. These alternate interfaces make it easier to move up to the new module without changing existing code.

However, these additional interfaces can get confusing. They can also preserve older-style constructions and habits, and make it harder to move ahead to well-factored, orthogonal code. Finally, such constructions are often an additional burden on the module maintainer, and may be dropped in future module releases. It's a good idea to update your code to use a module's primary interface.

With longer programs, it's important not to try to make all interface changes at once. Change the interface for a small section, test it, check it into CVS, and repeat. Once you've removed all of the old interface from a file or module, you can change the 'use' statement to avoid importing the functions that you no longer need. Make sure that you run tests afterwards.

CGI is a good example. The CGI module has a compatibility interface with the old cgi-lib.pl routines which make it easier to start using CGI in old code. Once you've updated to CGI, you can go through the code and update the interface in small steps. Once you think you've eliminated all of the code using the old ReadParse/%in interface, you can remove the ':cgi-lib' from the 'use CGI' declaration. Testing the code after that will tell you whether you've gotten rid of all of the old calls.

Change If/Then to Hash Lookup

Condition: You have a series of if/then/elsif statements mapping one set of values to another

Resolution: Build a hash table mapping the two sets of values, and turn the set of if/thens to a subroutine lookup.

Example

if ( $fruit eq 'lemon' ) { $color = 'yellow'; } elsif ( $fruit eq 'orange' ) { $color = 'orange'; } elsif ( $fruit eq 'tomato' ) { $color = 'red'; }
changes to:
our %Fruit_Colors = ( lemon => 'yellow', orange => 'orange', tomato => 'red', ); $color = $Fruit_Colors{$color};

Steps

  1. Build hash mapping conditionals to values.
  2. Replace conditional statements with hash lookup.
  3. Test.

Discussion
Frequently, programmers moving to Perl from other languages will have to map one value to another. Unused to the power of Perl's hashes, they will build huge chains of if/then statements-- if $foo is 'doogie', set $bar to 'howzer', or if $foo is 'michael', set $bar to 'knight', etc. These huge clusters are impenetrable to the eye. Plus, if at any time you need to alter these values, you need to go spelunking into lightless code-caverns, searching for just the right conditional.

The idiomatic Perl solution is to use hashes any time one is mapping one set of values to another set of values. With a hash, it's easy to see what maps to what. Even better, you can put the hash at the top of your file, where it's easy to find and alter without messing around deep inside the code.

Build a table. Start at the top of your conditional block and add one line similar to michael => "knight" for each conditional. Then you can replace the conditional block with a simple hash lookup.


Other Refactorings

Refactorings by other users.

Other lists of refactorings:


Contributors

Thanks to: