Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

To sub or not to sub, that is the question?

by tachyon (Chancellor)
on Jun 25, 2001 at 05:10 UTC ( [id://91160]=perlmeditation: print w/replies, xml ) Need Help??

The other day I was going trough some code I wrote a couple of yoear ago when I had only a bare grasp of the basics of Perl. Ughh - it was horrible.

Anyway one of the main things I noticed was a lack of consistency in deciding when to use a sub and when to cut and paste code. I am interested to know what 'rules/guidelines' other monks use. While you want to use subs for reusability of code sometimes it can be hard to read code that calls half a dozen short subs. There must have been much thought on this subject but, in a cursory search, I have not been able to find any good articles on it. Any thoughts or links would be appreciated.

cheers

tachyon

  • Comment on To sub or not to sub, that is the question?

Replies are listed 'Best First'.
Re: To sub or not to sub, that is the question?
by lemming (Priest) on Jun 25, 2001 at 05:20 UTC
Re: To sub or not to sub, that is the question?
by grinder (Bishop) on Jun 25, 2001 at 14:08 UTC

    This is a really difficult question to answer well, but I'll leap in anyway.

    The best way to learn your own guidelines is to read other people's code. You will never stumble onto a number of useful rules of thumb staring at your own code. Look at how other people do things, and copy them.

    Some rules of thumb I have come up with. (Later: the more I reread this, the more I can see all sorts of exceptions and qualifiers). Rules are meant to be broken, but of course you have to know the rules to be aware that you're breaking them.

    The division between parent code and child subroutine should be natural. It should seem... elegant. If it feels klugey... it probably is! Again, read other people's code. Doesn't have to be good. Read some bad code, and criticise it. Think how you would do it differently.

    Anyway, here are some things to think about:

    Multi-modal subroutines

    are a bad idea. If you're passing a mode parameter which is immediately inspected by the subroutine in order for it to decide how to proceed then you're probably travelling down the wrong path.

    A routine should do one thing, and one thing only. Once it has been read, a person should not have to go back to it time and again when reading other parts of the code.

    If you can't escape modality, include some good debug harnessing, so that the program flow can be traced by setting various levels of debug constants.

    Factorisation

    If you are performing functionality in two different parts of a script, then factor it into a subroutine. This often crops up when you are grovelling through data. Any time you are reading line by line (or paragraph, XML element or whatever) and remembering something about the previous line to see whether the current line indicates a change, you will have a block of code inside an if statement that performs the summary about the previous batch of lines. When you finally exit out of the read loop, you shall probably need to call that code block one again outside the loop to deal with the last batch of lines.

    Locality of reference

    A subroutine should have its entire context defined by the parameters passed to it. It should not rely on global variables. I break this rule all the time, although with the following context: it's okay for a subroutine to read global variables, but it should never modify them. If you're passing more than five parameters to a subroutine there's probably something bad happening to your code. Maybe some of that context would be better off wrapped up in a little object capable of maintaining its own sense of integrity.

    Input and Output

    Should be managed at the highest level of the code. Eschew little subroutines that call print themselves. Retain internal representations as long as possible, by passing back scalars, hashes or whatever, and have the parent perform the output instead.


    --
    g r i n d e r
Re: To sub or not to sub, that is the question?
by jplindstrom (Monsignor) on Jun 25, 2001 at 06:02 UTC
    Do things once, no more.

    Like the third normal form in database design, it's a noble vision; you may decide to violate the rule for practical reasons. But it better be a good reason :)

    In this case it might be readability, but I would try to counter that problem with a good naming convention and properly defined (yes, documented) interfaces first.

    <Drifting off-topic> Actually, I wrote an essay about Object Oriented Maintenance once which discussed an interesting problem in this area. Object oriented code tend to contain a lot of small (few lines, like 3-6) methods. Each method is easy to understand, but the overall system understanding is more difficult due to the "fragmented" nature of the code (and other OO things, like late binding).

    The studies I used in my essay contained mostly Smalltalk code, other languages had other but similar charachteristics. As for Perl... well, TMTOWTDI.

    <Further off-topic> The solution I suggested for this lack of system overview was better dev tools. With class browsers and automated documentation systems things improve. And POD is a truly great thing with Perl.

      /J

    Update:

    Both "Code Complete" and "The Pragmatic Programmer" contain advice regarding subroutines.

Re: To sub or not to sub, that is the question?
by voyager (Friar) on Jun 25, 2001 at 07:40 UTC
    From "The Pragmatic Programmer" (tip #22):
    DRY - Don't Repeat Yourself - Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.
    In short, when in doubt, make a sub. Name the sub properly, and you have replace several lines of code in the original location with a sub call that is, in effect, a comment.

    As to sub size, someone suggested no more than one page. I follow the rule that a sub (including comments and white-space) must fit on my screen without scrolling. Only excpeption is if there is a long list of constants or other literal text.

    I find it much easier to have subs like:

    sub foo { readstuff(); formatstuff(); writestuff() if check_stuff(); }
    I have seen some code that reads like English until you get to the subs that actually do anything. And the subs that do anything are just a few lines long, easy to understand, and documented by the sub name (check_and_exit_if_bad()).
      I have read the Pragmatic Programmer and it depressed me since I realized that:

      IF the book was correct
      THEN I am the worst Programmer in the world
      ELSIF the book was mostly correct
      THEN I am part of the really bad programmers

      My uncle who is a great programmer but no friend of Perl suggested that Perl wasn't terribly compatible with that book b/c it is not an Orthogonal language (and a lot of other things I won't wite here.)

      What do you think of that?

      --
      lmoran@wtsgSPAM.com
      print "\x{263a}"
      As the parent of 11 & 13 year-old boys, the role the book plays with me (or the me from a few years ago), is the same as I play with my sons.

      I was always annoyed at my parents for telling to wear a jacket (rain coat, hat, etc) just in case. When it got cold, and I shivered, I never credited my parents with foresight. Now, I find myself contantly reminding my kids to bring a jacket "just in case".

      A lot of the things in the book are ways that are differrent than you might choose (or stumble into), but don't always take longer. If you get in the habit of doing things "the right way", you won't be any slower, and if the unexpected happens, you'll be far ahead.

      In the example of "when to sub", the second use of the code it's faster to cut-n-paste then make a seperate sub and call from two places. But as soon as you need to make a change, or you have a third use for the code, you're already in the hole.

      I had a colleague once who defended his lack of subs because when you make a new sub you have to name it, and he suffered from "writer's block". Oh well.

Re: To sub or not to sub, that is the question?
by mrmick (Curate) on Jun 25, 2001 at 17:21 UTC
    I always try to use the following:
    • If there is a single distinct operation being done, make it a sub.
    • If something is being repeated, make it a sub.
    • If making a decision within the program, make it a sub.
    In other words, if in doubt, make it a sub. It will make your code easier to maintain.

    Mick
(Zigster) Map the language into the problem space
by zigster (Hermit) on Jun 25, 2001 at 19:42 UTC
    Ok, lots of good advice here however there is an important factor thus far not really covered.

    I use subs to simplify complex code. I dont mean in the obvious manor where subs can offer logical partitioning but more on the conceptual level. I am a man with a small brain (damn but if pooh does not have a comment for all eventuallities) When dealing with a particularly thorny programming challenge I avoid mapping the problem into the language space as is often done, instead I map the language into the problem space. Consider the nature of the problem you are solving deduce the language of the problem. Code subs to allow you to talk that language; addVector, splitStringByChar. Many of these subs could often contain a single line of code. Then solve your problem using the language of those subs. This frees you from the petty detail of the underlying -programming- language. Once the problem is solved and working the depreciate redundant subs by moving the code from the sub into the main body of the code. This is an idea most often seen in OO programming and it is very powerful (most good programmers do it without meaning too), it is appropriate for application and algorithmic centric problems.

    jplindstrom touched on this idea in his node in this thread, I agree COMPLETELY with what he said it is dangerous to fractionalise code into many small and useless subs. It is important to rationalise the need for each sub at the end and deprecate where required.
    --

    Zigster

      I feel that if the problem fits into the idiom of the language forget the problem space. Languages are made with a particular application in mind, so it seems foolish to disregard this. Doing so often means being counterintuitive in solving a problem, approaching the problem from the point of the language. Which can often gives positive results and gives you a better insight into the system that aping the existing system won't yield.

      I am all for abstraction where it is practical, I baulk at "This frees you from the petty detail of the underlying -programming- language". I don't want to be freed from this very fundamental thing, I choose to code in a language that lends itself to the problem, be it Perl, Bash, C, Java, Javascript, VB or XML. Right Tools for the right Job etc. These days there is a blur as most languages are striving for complete coverage of applications.

      splitStringByChar isn't that @foo=split//,$bar or a kingdom of alternatives to be used discerningly in situ?

      --

      Brother Frankus.

      ¤

Re: To sub or not to sub, that is the question?
by Albannach (Monsignor) on Jun 25, 2001 at 19:51 UTC
    ...and when to cut and paste code...

    The idea of cut and paste code makes me shudder as it tends to create bugs very easily (speaking from experience), the checking and correction of which may well consume any time saved by the cut and paste operation. Fortunately I think it can often be avoided:

    • if you are pasting code and not changing it (i.e. it is a duplicate) then that certainly needs to be a subroutine (or sometimes just a loop depending on the complexity). Remember that if the source and destination of the cut and paste are identical now, they should probably always be identical, and the only way to ensure that is to use a single subroutine.
    • if you are pasting code and changing the copy slightly, it may still be better as a subroutine if the code is fundamentally the same.
    • as others have said, if you have a hard time understanding code with a lot of subroutine calls, then you probably need better names. After all, the built-in Perl functions are also aggregating code and you're far better off calling split than pasting in code to do the splitting each time! In fact, many people will often start a project with some vague pseudocode like
      for each data file { check for bad data and skip file if found repair damaged data calculate statistics add to database add to report }
      and something like that could easily be made into the program's main loop almost word-for-word, making the code self-documenting. </code>

    Finally, here's another good Monastery thread on this topic: When do you function? - I'm certain there are more.

    --
    I'd like to be able to assign to an luser

Re: To sub or not to sub, that is the question?
by dragonchild (Archbishop) on Jun 26, 2001 at 00:23 UTC
    Another way to look at this issue is that a subroutine is a logical entity representing one conceptual block. This means that if a statement (or group of statements) is conceptually one action, it should be a sub. If it is conceptually two actions, it should be two subs. And, so on.

    Thus, for example, you often see print_header() and print_footer() in CGI scripts. What this does is group together the print statements that make up the header. The average reader doesn't care what goes into the header when s/he is reading the program flow. If s/he cares, then s/he can go into that function and concentrate on the action of printing the header, without getting distracted by anything else.

    Now, one thing I would be careful of is over-compartmentalizing. One thing I hate about reading other people's OO code is that I have to trace 10 function calls just to find out who's doing what. Usually, this is in 10 different files, to boot! I would very much reccomend thinking about generic functions. For example:

    sub get { my $self = shift; my ($attr_name) = @_; if(exists $self->{$attr_name}) { return $self->{$attr_name}; } else { return undef; } }

    is much better and easier to read than:

    sub getAlpha { my $self = shift; return $self->{Alpha}; } sub getBeta { my $self = shift; return $self->{Beta}; } sub getGamma { my $self = shift; return $self->{Gamma}; }

    You have one function that does everything in a very understandable manner. (For those who complain that generic functions are slower, I wonder why you're using OO Perl...) If you want further generic stuff (in this case, boundary checking and the like), you can use a dispatch table to hand off to handlers in the cases where a generic function doesn't encapsulate enough. Or, you could just do your checking outside, either in main code or in another function. The basic idea is that there is one function, at the heart, that does one thing.

    Now, I'm contradicting myself, and I know this. But, as the other posters have made clear, there is no hard-and-fast rule as to when a sub is good or not. It very quickly becomes a religious debate. You have to come to your own conclusions. I know my level, and I suspect many of the monks know their own levels, too. Where's yours? That's a matter of experience, which comes only from trial and error.

      One suggestion. I would prefer to have your get() routine barf if you tried to get a non-existent attribute instead of proceeding. This allows you to catch typos in names at run-time.

      However beyond that using subroutines gets you into the following paradox. Suppose that working with heavily factored and organized code is 5 times harder per page of code. Suppose that it does the work of 20 pages of unfactored and unorganized code. If you assume that the issue of tracking down the same bugs in unfactored code is similar to the issue of changes made for one reason affecting things all over the place, then that is a gain in productivity of a factor of 4. However whenever picking up that code I can guarantee you that you will feel that factor of 5 issue.

      As for the assumption, that is where experience comes in. It is easy to break things into a million subroutines. What you learn over time is how to break things into a million subroutines that organize into clumps which present loosely coupled interfaces to the rest of the world. After you do that you just do not modify existing public interfaces lightly. Add a parameter, sure. But don't lighly assume that nothing depends on old (even old and stupid) behaviour.

      Also another good idea - the test suite - makes this easier by giving you a good idea what behaviour someone else is depending on somewhere which you didn't remember.

        One suggestion. I would prefer to have your get() routine barf if you tried to get a non-existent attribute instead of proceeding. This allows you to catch typos in names at run-time.

        Actually, in the original code, it printed out an error, then continued. And, returning undef does barf, if you're bothering to check your return values for error before continuing. If you're not, then you deserve all problems you get. (Hint, hint!)

        To continue on with tilly's million subroutines, it's all about interfaces. A subroutine, in a very basic way, is just an API between you, the programmer, and some (hopefully!) well-defined and bug-free behavior. What the subroutine does under the hood should be something you don't have to worry about. That's one of the reasons to create a subroutine. "Code once and forget." (Also, there's "Code once, use twice" for the other reason to create subroutines.)

      Now, one thing I would be careful of is over-compartmentalizing. One thing I hate about reading other people's OO code is that I have to trace 10 function calls just to find out who's doing what.

      I think this is in part related to the way OO works with inheritance. It's basically a good thing, because code is located at the "correct" level if you do your OO design right. But like you say, it's a burden on the programmer to see what's actually happening where.

      And like I mentioned in my previous post, one solution is Class Browsers that provide a coherent view of what methods and properties are available (i.e. the interface) in a class, regardless of where they are physically/inheritance-wise implemented.

      For example, in the Eiffel programming environment (uh, correction, when I did Eiffel in school five years ago :), there is the option to provide a "flattened" class, with all the inherited code in one single source file.

      Actually, I thought that was such a good ide that I created a module Pod::Class::Flattened to flatten POD comments (I use POD a lot myself) for a class with parent classes. It's not yet publicly available, but maybe if this sounds like someting that would be interesting to more people...

      /J

Re: To sub or not to sub, that is the question?
by dimmesdale (Friar) on Jun 25, 2001 at 23:36 UTC
    Well, a lot of good replies have already been offered, but when has that stopped me?

    A simple thing to remember is that a section of code should be no longer than a person is willing to read at a time. A person earlier suggested about one page. The exact length may vary, and depends on specific needs. BUT, remember that if someone else has to come in and read it (or you, a while after you wrote it--which brought us this question :) SO, you should make the logic of the code flow. If your function is getting too long and you're finding yourself scrolling up and down to find out something about it, you're probably better off to make some sub-functions(ha; thinking of perl, that's kind of redundant).

    If there's a function that's calling too many small functions, there are several possibilities.
    (1)Not worry about it
    (2)Combine some of them
    (3)Make a function that's only job is to call them

    To explain point three: if, for example, you have some complex data types that you need to have the user initialize, and your code looks like this:

    $var1 = init_var1(); $var2 = init_var2(); $var3 = init_var3(); ...var4, etc...
    Then what you could do is this:
    ($var1,$var2,$var3,..etc.) = init_all_variables();
    init_all_variables() could just be a function that calls the init_varX functions. This method is very useful when it is needed.
Re: To sub or not to sub, that is the question?
by frankus (Priest) on Jul 24, 2001 at 18:34 UTC

    Most of the Perl I have worked on has not been written to any coding standard1 and has possibly taken longer to understand, but has been rewarding and enhanced my understanding.

    I think consistency is over-played. If you apply a policy of where it is good to have a sub, you'll either break that rule when it is apt to do so or worse: you won't break that rule at all.

    1. Perl and coding standards sounds like an oxymoron to me, yet I maintain that Perl is not a hackers language.

    --

    Brother Frankus.

      Importance of coding standards depends also on the size of the project. For 5 lines - even use strict; is overkill. But the bigger the project, the more programmers involved - the stricter standards you should have.

      I worked recently on a project with 40MB of source code (not in perl) - as a result of 10 years of development. Without rather strong coding standards (using custom toolkit) and agreed-on procedure names performing standard - trying to understand system will be much harder. 40MB! After 2 years, I was expert in one small part, conversant in couple others, and know who is expert for what part.

      In projects like that, game is not about ... possibly taken longer to understand, but has been rewarding and enhanced my understanding... . It's about plain survival... :)

      So if you insist on using perl without coding standards in such a project, I'll propose to use other language - or I'll prefer to work on different project.

      pmas
      To make errors is human. But to make million errors per second, you need a computer.

        Firstly I only apply this philosophy to Perl, my bad for not stating that explicitly.

        IMNTHO anything that has taken 10 years to develop is ripe for rewriting or part of NASA's space project. I think it's daft, having a document dictating how to code, it implies the authors know something (quite often wrong); I prefer a more natural method of buddying developers, that way the learning process is two-way

        I've only seen Coding standards work when they were constantly reviewed and altered by the people using them. IME this rarely happens as it involves doing something, it's easier to think write "the coding standard document" and leave it to stifle your developers.

        --

        Brother Frankus.

        ¤

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://91160]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (5)
As of 2024-04-19 06:36 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found