Notes from the Refactoring Ward

...then I notice some obvious duplications in my code! I get to work right away. After a couple of hours of refactoring I succeed in distilling the foul repetition to the irreducible essentials:

my @todo = (
  [ 'foo',     42, 'http://www.google.com'     ],
  [ 'bar',      6, 'http://www.boingboing.net' ],
      .                     .
      .                     .
      .                     .
  [ 'mumble', 101, 'http://search.cpan.org'    ],
);

jewel( @$_ ) for @todo;

# gloriously refactored code follows...
[download]

Yep. That's right. Once all the duplications are factored out, I determine that whatever it was my code was repeating requires, in essence, a bit of string, a number, and a URL. Everything else follows beautifully from this!!!

Unfortunately, now I'm left with all this gnarly, irreducible junk that has no intrinsic meaning whatsoever other than being the input to my jewel. (I suppose it makes sense that after factoring out all the regularities the crazy stuff is left behind.)

I must tame this last bit of orneriness, but how? Encapsulation! I create a class, InputToJewel, with accessors, string, number, and url...

At this point I wake up in a cold sweat.

the lowliest monk

Comment on Notes from the Refactoring Ward Download Code

Replies are listed 'Best First'.
Re: Notes from the Refactoring Ward by revdiablo (Prior) on Sep 08, 2005 at 22:44 UTC
There is a happy medium I often reach for instead: `my @todo = ( { label => 'foo', offset => 42, url => 'http://www.google.com' }, { label => 'bar', offset => 6, url => 'http://www.boingboing.net' }, . . . . . . { label => 'mumble', offset => 101, url => 'http://search.cpan.org' }, );` [download] Sure, you may say that's introducing more duplication. But the benefit is a chunk of input that is a lot more self-documenting, without being as heavy as a whole new InputToFoo class (semantically speaking, of course -- we all know most objects in Perl are implemented as hashrefs anyway). Basically, I'm using named arguments instead of positional. Try it out sometime; you just might like it. PS: you also might notice I used more descriptive keys than 'url', 'string' and 'number'. Obviously, I just invented them out of whole cloth, but if it's your application, you probably know what these chunks of data actually represent.	[reply] [d/l]
Re: Notes from the Refactoring Ward by Tanktalus (Canon) on Sep 08, 2005 at 23:08 UTC
Along the lines of revdiablo's post, you could put this into a database table. ;-) Actually, in the same project where I use Tasks and TaskManagers, I use an XML-like data format to store these commonalities. So, in that spirit, I'd suggest an XML file: `<todo> <item> <string>foo</string> <number>42</number> <url>http://google.com</url> </item> <!-- ... --> </todo>` [download] Because, not only is this the part of the code most likely to undergo changes, but you can get others to make those changes instead. All that perl code I've written for handling our data files, and now some manager-type can come along and add, remove, or modify the data from which I do my work, and things work exactly the way they think it should. Imagine your manager, or someone completely outside of your group, wanting to add 25 new URLs to scrape. You can tell them to add it themselves, while you go work on more interesting things. Ok, so you have to spend some time training them on this simple data. But when they come along and add another 25, and want to modify 30 others, well, you don't need to care. At least, that's the theory. In practice, it has just made things so much easier on me that I've been able to devote time to improving other things. Which is still worth it. (It also allows me to use the VCS system to determine the changes to the data vs the changes to the code, which is also worth the effort.)	[reply] [d/l]
Re: Notes from the Refactoring Ward by kvale (Monsignor) on Sep 08, 2005 at 22:42 UTC
I think more explicit variable names would help here. @todo is a generic name; all data is there to be processed at some point. How about @websites or @sites_to_be_scraped? Similarly, that for loop could be self documenting if variables were introduced: `for my $website (@websites) { my ($site_name, $db_key, $site_url) = @{ $website }; scrape_site( $site_name, $db_key, $site_url); # .... further processing }` [download] -Mark	[reply] [d/l]
Re^2: Notes from the Refactoring Ward by Anonymous Monk on Sep 09, 2005 at 00:01 UTC
I think more explicit variable names would help here. If you're too explicit about what your code does right now, you're going to have to abstract it later. If you're too abstract, you're not saying anything specific about your code. You're going to have to clarify it for the maintainer. You can't win.	[reply]
Re: Notes from the Refactoring Ward by pg (Canon) on Sep 09, 2005 at 00:26 UTC
Two suggestions: Your jewel should accept an array ref instead of array, especially when you already have an array ref, to flatten it before passing slows things down and is less beautiful ;-) Update: to add benchmark. `use Benchmark qw/timethese/; use strict; use warnings; my $array = [1..10000]; timethese(10000, {"straight" => sub {straight($array)}, "flatten" => s +ub {flatten(@$array)}}); sub straight { ; } sub flatten { ; }` [download] `Benchmark: timing 10000 iterations of flatten, straight... flatten: 6 wallclock secs ( 5.08 usr + 0.00 sys = 5.08 CPU) @ 19 +69.28/s (n =10000) straight: 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU) (warning: too few iterations for a reliable count)` [download] Never include data in your program. Code is code, program is program. With a language like Perl, it might be less obvious why you should not mix them. But with other languages that need to be compiled, you don't want to recompile your program everytime when the data is changed. Perl might go down that path though.	[reply] [d/l] [select]
Re^2: Notes from the Refactoring Ward by Anonymous Monk on Sep 09, 2005 at 00:58 UTC
Never include data in your program. Code is code, program is program. It's an artificial distinction, though. Code is data, though. The days of hard-wired consoles where people had to unplug wires to reprogram computers are long gone. Computer programming is "just" an act of configuration (simple to state, hard to do!) So is editing a configuration file. The more options the configuration file has, the closer it is to being a "language" in it's own right. If it's easier for the programmer to edit the source code and rebuild the application, then that is sometimes what should be done. If it's easier (or cheaper, or company policy, or whatever) for the end user to do some of the configuration maintenance, then that is what should be done. The same level of testing needs to be done in each case; the application must be proven to work with the new configuration, be it compiled in, or edited. Solid blackbox testing always assumes a computer program will blow up when handed untested inputs...	[reply]


Pathologically Eclectic Rubbish Lister
	PerlMonks