Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Notes from the Refactoring Ward

by tlm (Prior)
on Sep 08, 2005 at 21:45 UTC ( [id://490340]=perlmeditation: print w/replies, xml ) Need Help??

...then I notice some obvious duplications in my code! I get to work right away. After a couple of hours of refactoring I succeed in distilling the foul repetition to the irreducible essentials:

my @todo = ( [ 'foo', 42, 'http://www.google.com' ], [ 'bar', 6, 'http://www.boingboing.net' ], . . . . . . [ 'mumble', 101, 'http://search.cpan.org' ], ); jewel( @$_ ) for @todo; # gloriously refactored code follows...

Yep. That's right. Once all the duplications are factored out, I determine that whatever it was my code was repeating requires, in essence, a bit of string, a number, and a URL. Everything else follows beautifully from this!!!

Unfortunately, now I'm left with all this gnarly, irreducible junk that has no intrinsic meaning whatsoever other than being the input to my jewel. (I suppose it makes sense that after factoring out all the regularities the crazy stuff is left behind.)

I must tame this last bit of orneriness, but how? Encapsulation! I create a class, InputToJewel, with accessors, string, number, and url...


At this point I wake up in a cold sweat.

the lowliest monk

Replies are listed 'Best First'.
Re: Notes from the Refactoring Ward
by revdiablo (Prior) on Sep 08, 2005 at 22:44 UTC

    There is a happy medium I often reach for instead:

    my @todo = ( { label => 'foo', offset => 42, url => 'http://www.google.com' }, { label => 'bar', offset => 6, url => 'http://www.boingboing.net' }, . . . . . . { label => 'mumble', offset => 101, url => 'http://search.cpan.org' }, );

    Sure, you may say that's introducing more duplication. But the benefit is a chunk of input that is a lot more self-documenting, without being as heavy as a whole new InputToFoo class (semantically speaking, of course -- we all know most objects in Perl are implemented as hashrefs anyway). Basically, I'm using named arguments instead of positional. Try it out sometime; you just might like it.

    PS: you also might notice I used more descriptive keys than 'url', 'string' and 'number'. Obviously, I just invented them out of whole cloth, but if it's your application, you probably know what these chunks of data actually represent.

Re: Notes from the Refactoring Ward
by Tanktalus (Canon) on Sep 08, 2005 at 23:08 UTC

    Along the lines of revdiablo's post, you could put this into a database table. ;-)

    Actually, in the same project where I use Tasks and TaskManagers, I use an XML-like data format to store these commonalities. So, in that spirit, I'd suggest an XML file:

    <todo> <item> <string>foo</string> <number>42</number> <url>http://google.com</url> </item> <!-- ... --> </todo>
    Because, not only is this the part of the code most likely to undergo changes, but you can get others to make those changes instead. All that perl code I've written for handling our data files, and now some manager-type can come along and add, remove, or modify the data from which I do my work, and things work exactly the way they think it should.

    Imagine your manager, or someone completely outside of your group, wanting to add 25 new URLs to scrape. You can tell them to add it themselves, while you go work on more interesting things. Ok, so you have to spend some time training them on this simple data. But when they come along and add another 25, and want to modify 30 others, well, you don't need to care.

    At least, that's the theory. In practice, it has just made things so much easier on me that I've been able to devote time to improving other things. Which is still worth it.

    (It also allows me to use the VCS system to determine the changes to the data vs the changes to the code, which is also worth the effort.)

Re: Notes from the Refactoring Ward
by kvale (Monsignor) on Sep 08, 2005 at 22:42 UTC
    I think more explicit variable names would help here. @todo is a generic name; all data is there to be processed at some point. How about @websites or @sites_to_be_scraped? Similarly, that for loop could be self documenting if variables were introduced:
    for my $website (@websites) { my ($site_name, $db_key, $site_url) = @{ $website }; scrape_site( $site_name, $db_key, $site_url); # .... further processing }

    -Mark

      I think more explicit variable names would help here.

      If you're too explicit about what your code does right now, you're going to have to abstract it later.

      If you're too abstract, you're not saying anything specific about your code. You're going to have to clarify it for the maintainer.

      You can't win.

Re: Notes from the Refactoring Ward
by pg (Canon) on Sep 09, 2005 at 00:26 UTC

    Two suggestions:

    • Your jewel should accept an array ref instead of array, especially when you already have an array ref, to flatten it before passing slows things down and is less beautiful ;-) Update: to add benchmark.
      use Benchmark qw/timethese/; use strict; use warnings; my $array = [1..10000]; timethese(10000, {"straight" => sub {straight($array)}, "flatten" => s +ub {flatten(@$array)}}); sub straight { ; } sub flatten { ; }
      Benchmark: timing 10000 iterations of flatten, straight... flatten: 6 wallclock secs ( 5.08 usr + 0.00 sys = 5.08 CPU) @ 19 +69.28/s (n =10000) straight: 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU) (warning: too few iterations for a reliable count)
    • Never include data in your program. Code is code, program is program. With a language like Perl, it might be less obvious why you should not mix them. But with other languages that need to be compiled, you don't want to recompile your program everytime when the data is changed. Perl might go down that path though.
      Never include data in your program. Code is code, program is program.

      It's an artificial distinction, though.

      Code *is* data, though. The days of hard-wired consoles where people had to unplug wires to reprogram computers are long gone. Computer programming is "just" an act of configuration (simple to state, hard to do!)

      So is editing a configuration file. The more options the configuration file has, the closer it is to being a "language" in it's own right.

      If it's easier for the programmer to edit the source code and rebuild the application, then that is sometimes what should be done.

      If it's easier (or cheaper, or company policy, or whatever) for the end user to do some of the configuration maintenance, then that is what should be done.

      The same level of testing needs to be done in each case; the application must be proven to work with the new configuration, be it compiled in, or edited. Solid blackbox testing always assumes a computer program will blow up when handed untested inputs...

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://490340]
Approved by itub
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (3)
As of 2024-04-25 19:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found