http://qs321.pair.com?node_id=298755

This meditation is based on a conversation that I had some time ago, which I have been meaning to comment on publically for some time now.

One of my favorite programming books was and is, Code Complete. Its author, Steve McConnell, has written several other books that also go in my top 10 list.

But a friend of mine, who I respected, took exception to some of its advice. In chapter 5.5 it cited research on the optimal length of subroutines, and concluded that the evidence for benefits from very short routines (say under 20 lines) is scant, but routine lengths over about 200 lines start getting much worse. Chapter 15.2 has similar advice on loops, they should not exceed one page in length, though in practice it is rare for good programmers to want more than 15-20 lines.

However this friend, who had been doing OO programming for a long while found that conclusion absurd. Competent OO programmers with a lot of experience tend to go for shorter methods, often far shorter. 10 lines is pretty common. And I had to admit that my subjective experience says that this is good. Short routines really do make a difference.

So why is the research that Steve McConnell found so at odds with our direct experience? I think that the answer lies in the second study he cites (by Shen et al, in 1985), that length is not correlated with errors, but complexity is. This point is revived in section 17.5 with his discussion of Tom McCabe's measurement of complexity based on how many decision points it has. The complexity of a function is 1 (for the function), plus 1 for every if, while, repeat, for, and, and or, plus 1 for every case of a case statement. The suggestion (presumably based on research, not all quoted there) is that a routine of complexity up to 5 is probably fine, 6-10 might be getting out of hand, and higher than that tends to indicate problems.

Now that measure was first proposed in 1976, and was studied in the context of procedural languages like C and Pascal. How can we modify this measure for an OO program? Well what leaps out at me is that every method call has an implicit if in it! Therefore long stretches of boring procedural code may have very few decision points, but any significant stretch of OO code is going to have a lot. Make a dozen method calls and..oops.

Others may disagree with this heuristic, but I think that there is some degree of validity in some such modification. And it suggests that there may be good reason behind several things which I happen to also believe:

  1. When writing OO code, short methods matter.
  2. Coding habits (eg long routines) which are fine in procedural code, can get you into trouble in OO.
  3. That reading good OO code is line for line harder than good procedural code. (OTOH the OO code can get the job done in less code - if you use it right.)
  4. Layering abstractions on top of each other has a significant cost. Don't do it unless there is a corresponding benefit that you can point to for justification.
So what are other people's thoughts/experiences on long functions?

Replies are listed 'Best First'.
Re: Short routines matter more in OO?
by pg (Canon) on Oct 13, 2003 at 03:47 UTC

    NO, shorter function/procedure/method matters, but shorter method in OO does not matter more than its counterparty in traditional non-OO coding.

    Let's imagine that you are converting some OO code back to traditional non-OO style, is there a need or will you merge two or more methods into one function? No, that is absolutely not required.

    Let's look at the opposite: if you are converting some non-OO code into OO method, will you split some functions? Yes, but that is not because we are moving to OO, but becasue that they should be splited in the first place, even in non-OO style. If today you ask people to redo them again in non-OO way, they will do it in a more proper way, as we are making progress everyday, not just because of OO.

    OO is beautiful enough, and there is really no need to decorate it with extra praises (that does not belong to her;-)

    This is what I think/do:
    • The size of the procedure/function/method matters, but it is less important than the length of code being blocked inside your out-most looping structure. In real life, I dislike any code has out-most looping goes accorss page boundary. It sharply decrease people's ability to understand the function/procedure/method. To me, this is unrelated to the fact whether the code is OO or not. (Those get/set/is method should be excluded from the discussion, as those 2 or 3 liners appears way too often in OO, and works against the fairness of our statistics.)
    • Layering abstraction does cost something, but that is usually on the coding side. From a designer's point of view, adding a layer of abstraction usually means to extend the reusability and to make the structure much more clear, so the cost actually goes down. (I am thinking that this might also has something to do with the language, not really the OO concept. OO is not exactly the same thing in different languages. In java and c++ they are slightly different, but if you compare Perl OO with Java or c++, they are quite different.)
      (This paragraph is added later) I come up with the thought that, althouth layering is not wrong in Perl, it is, most of the time, less practical in Perl than in other languages, becaue Perl is usually used to put things together QUICKLY, and that's one of the major benefit most of the people seeking from Perl.
    Oh, by the way, welcome back! The first time I started to deal with this site, you were away, and there were stories about you went around. This time I come back, you are a real person (at least in the virtual world).
      Let's imagine that you are converting some OO code back to traditional non-OO style, is there a need or will you merge two or more methods into one function? No, that is absolutely not required.

      You wouldn't merge function, but remember that many objects are just containers for a bunch of variables. They are nothing more that structs, and many objects are full of small accessor functions of the form:

      sub accessor { my $self = shift; $self -> {key} = shift if @_; $self -> {key} }

      That's code that will evaporate when converting OO to non-OO, as you would replace the call:

      $obj -> accessor ("value");
      with
      $obj -> {key} = "value";

      And by eliminating many short methods, the average size of the methods will increase.

      Abigail

      The loop point was made in Code Complete as well.

      As for layering abstraction, my opinion is that there are real costs, and they tend to be much higher than many designers think. See Design Patterns Considered Harmful for a similar note.

      And thanks for welcoming me back. Contrary to some allegations, I am a real person, not a myth. :-)

        I agree that layering has real cost there as anything else does, but I still think layering is extreamly important to a clear design, as long as you do it with a good PLAN. Lots of troubles are caused by bad designs under the cover of good principles, not those good principles themselves.

        From time to time, I see people copy and paste code from one module to another, and then slightly modify them. Layering is usually the best way to save people's life under this kind of situation. and make it more beautiful. Copy and paste makes programmer's life miserable, as now you might have to change whole bunch of code for a slight modification that you would be able to make at a single point if you layering properly.

        With layering in mind, now you start to establish the sense of digging the similarity between modules that usuaully looks totally unrelated, and an entire new world is opened for you and the people work with you.

        Well the copy and paste stuff is just one extream (and most simple) example, but the most miserable one. As now programmers are turned into production lines, instead of human being who want to spend more time thinking and dreaming.

Re: Short routines matter more in OO?
by Anonymous Monk on Oct 13, 2003 at 05:17 UTC
    I don't think it is so much a matter of short routines mattering more in OO as it is that short routines make more sense and are more natural. If am procedural coding and following a rule of thumb that each routine should do one thing (and do it well), and if a particular one-thing is even moderately complex then I can easily wind up with 20 line, 40 line, or even 100 line routines. Iff that code is tightly integrated and really involved in doing just one thing, then even the 100 line routine flows naturally.

    Some people might find several seemingly logical chunks of that 100 line routine and pull them into their own routines giving say a 30 line routine with half a dozen 10-20 line helper routines. This is usually false complexity reduction. The code is highly integrated, and all we've done is push the complexity around a smattering of highly coupled routines, rather than leaving it in a single logical flow (we've probably raised the overall complexity in our attempt to reduce local complexity). It is *often* (not always) a mistake to take a nice integrated 100 line routine and break it into 10 coupled routines (the double meaning of 'break' is intentional).

    In the OO world we live with highly coupled code all the time. But in OO it is an organized coupling that guides the design from the beginning, leading to a multitude of short methods in a class that rely on each other as well as state. It is not simply a way of collecting and grouping coupled code, modules (or packages, or libraries, etc) can do that, it is a way of structuring and organizing the coupling in a manner that relieves the programmer of having to manage it. Thus, not only are short methods more natural, very long methods are *often* (not always) indicative of something amiss in the design.

    But all this is but one viewpoint, and simplified for expedience.

      If this is but one viewpoint, then it is a viewpoint that I like hearing from. :-)

      I will have to think about that. I've found that code where I setting up a lot of handlers (often with closures) that my function length goes back up because one function creates several closely related small ones. Thinking about that in terms of managing coupling could explaing why I'm naturally inclined to group things the way that I do.

      I also had a vaguely related thought at Pass by reference vs globals.

        If this is but one viewpoint, then it is a viewpoint that I like hearing from. :-)
        I still have much thinking to do with regards to this notion. But I do think one of OO's non-explicitly recognized advantages is that of managing coupling. The object is an interface to a set of coupled code, relieving both the object writer and the object user from having to manage that aspect of code complexity themselves.

        And, one of OO's disadvantages is not recognizing this explicitly, because we then we could have much more meaningful discussions about such controversies as: Are getter/setter methods evil when used from outside the object? from inside the object? Should objects be divorced from their UI via a controller? encapsulate their UI? or provide proxy UI objects? How do we reconcile full encapsulation with inheritance?

Re: Short routines matter more in OO?
by dws (Chancellor) on Oct 13, 2003 at 06:26 UTC
    So why is the research that Steve McConnell found so at odds with our direct experience?

    You've hit on a form of sample-related bias that research to determine "optimal" often falls prey to. The bias enters by way of an unstated assumption that the population that the sample is drawn from won't shift or change its nature.

    The studies McConnell cites were done while object-oriented approaches where still taking hold. Their "sample" was procedural code written in the language of the day. At best, that was C, and I'll bet it included a lot of Fortran and COBOL. But in the following two decades, the "population" shifted. Were those studies to be done again today, I'll bet they would come up with very different answers about what "optimal" is. And if you were to do the studies again in 20 years, the answers might be different again.

      I agree.

      However I still would like to understand why the answers changed. And more importantly, what remained invariant between the art of programming then and now.

      After all at some point the question of what is effective programming practice becomes a psychological question about modes and means of human comprehension. And humans don't change that fast. Criticisms of goto in the late 60's are still on target today. Justifications of it from then are less so - because with experience we have learned how to express with exception mechanisms what then we only knew how to say with goto.

      So an understanding now of (human) invariants is knowledge that is likely to age fairly well. Which is hard to find in computing. :-)

        However I still would like to understand why the answers changed. And more importantly, what remained invariant between the art of programming then and now.
        I'd hazard a guess that what changed was access to the data.

        In non-OO code, since you've got direct access to the data its easier to roll multiple actions into a single sub. With OO code, though, the driving sub can't do that, so it calls into methods on the object to do it, and generally when people write the methods they make them single-purpose so they're reusable.

        What you get is essentially another level of indirection--rather than a sub that looks like:

        code to do A; code to do B; code to do C;
        you end up with:
        object.method_to_do_A; object.method_to_do_B; object.method_to_do_C;
        since people seem to be more inclined to make methods do less stuff than they are to make functions do less stuff.
      So perhaps it is not that in OO short procedures matter but rather than in all languages more expressive than COBOL procedures should be shorter.
Re: Short routines matter more in OO?
by Juerd (Abbot) on Oct 13, 2003 at 06:30 UTC

    When writing OO code, short methods matter.

    Short code matters always. If you can divide your 250 line snippet into five logical chunks, do so. If not, try to redesign.

    Coding habits (eg long routines) which are fine in procedural code, can get you into trouble in OO.

    They're not fine and OO is no different from procedural here. Long routines get you into trouble regardless of how the routines are used.

    That reading good OO code is line for line harder than good procedural code. (OTOH the OO code can get the job done in less code - if you use it right.)

    This is a hard one. I don't really know. I do know that code that uses documented OO is easier to read than code that uses other documented functions. When you have to read line for line (ie: no documentation to help you out), then you're probably right.

    OO code is much easier to document, though.

    Layering abstractions on top of each other has a significant cost. Don't do it unless there is a corresponding benefit that you can point to for justification.

    If the abstraction lets you code faster, do it. If it doesn't, it's not worth a single line of code.

    Should the abstraction have too much overhead, you can always choose to make the code uglier to optimize it. This is better than starting with something ugly that can always be made beautiful later, because that never happens once code works.

    So what are other people's thoughts/experiences on long functions?

    Use short functions, if only to be able to profile (Devel::DProf and friends) and optimize.

    Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

      Shorter is always better how?

      Less error prone? Five studies cited by McConnel found that routine size was either inversely correlated with error frequency, or not correlated at all [1,2,3,4,5]

      Easier to understood? A study of upper level comp. sci. students found that comprehension of a super-modularized program into routines about 10 lines long was no better than their comprehension of a program with no routines (but when moderate length routines were used, comprehension scores rose 65%). [6]

      Less changes required? Another study finds that code needs to be changed least when routines average 100 - 150 lines. [7]

      So, on the one hand we have several empirical studies which find that shorter routines do not require less changes, are not easier to understand, and are not less error prone. All of that relates to software cost in the real world. On the other hand, we have your unsupported opinion.

      [1] Basili & Perricone (1984) "Software Errors and Complexity: An Empi +rical Investigation." Communications of the ACM 27, no. 1 (Jan) 42-52. [2] Shen et al (1985) "Identifying Error-Prone Software --- An Emprici +al Study" IEEE Transactions on Software Engineering SE-11 (Apr) 317- +324 [3] Card, Church, & Agresti (1986) "An Empirical Study of Software Des +ign Practices" IEEE Transactions on Software Engineering SE-12 no 2 ( +Feb) 264-271 [4] Card & Glass (1990) "Measuring Software Design Quality". Englewood CLiffs, N.J.: Prentice Hall [5] Shelby & Basili (1991) "Analyzing Error-Prone System STructure" I +EE Transactions on Software Engineering SE-17 no 2 (Feb) 141-152 [6] Conte, Dunsmore, Shen (1986) "Software Engineering Metrics and Mod +els" Menlo Park, Calif: Benjamin/Cummings [7] Lind & Vairavan (1989) "An Experimental Investigation of Software Metrics and Their Relationship to Software Development Effort" IEE +E Transactions on Software Engineering SE-15 no 5 (May) 649-653

        Less error prone? Five studies cited by McConnel found that routine size was either inversely correlated with error frequency, or not correlated at all

        The studies could be right.

        Less changes required? Another study finds that code needs to be changed least when routines average 100 - 150 lines.

        The study could be right.

        On the other hand, we have your unsupported opinion.

        Yeah. Unsupported. Maybe even wrong.

        But I have found small things to be easier to maintain. More flexible. Easier to change, even though more lines change. Easier to re-use. Small subs are usually easier to read and understand.

        Easier to document. This is worth a lot.

        Whether or not short subs work for you depends on more than Perl alone. Short subs are better than long subs, much like how short sentences are better than long sentences. The long ones may seem more brilliant and more intelligent, but the shorter ones are clearer and much easier to read.

        Guess why books for children use short sentences. Guess why most programming introductions start with printing Hello, world.

        Of course, programs/modules with short subroutines are harder to create (since it requires more design) and slower.

        I do not know of any study that supports my opinion.
        I do not know of any study. I don't care much about studies either, since few are done using the right context and variables.

        Juerd # { site => 'juerd.nl', plp_site => 'plp.juerd.nl', do_not_use => 'spamtrap' }

        Im very glad you spoke up in this column. I don't agree that short subroutines are necessarily more understandable and easier to manage. In fact on the contrary. I've found that short subroutines and methods make understanding the code flow of a program much more difficult. (As yourself and others such as Corion have pointed out.) Your comments here have articulated what I've thought (or felt) when I've seen this argument before. I hate dealing with code where methods only do one or two lines of action and there are many many such methods and subs. By the time I trace through a horrible maze of tiny subs each almost identical to the next my head hurts, I can't get a mental model of the program flow (and I typically want to fly to Australia to strangle the author responsible :-)

        As you said elsewhere, subroutines and methods should be a small and as simple as they can be and get the job done. They shouldn't do multiple things and be huge monstrosities but at the same time they shouldn't be sliced up into bits that are only going to be called from one place.

        For me a sub/method should do a single conceptual task. If that means it ends up quite long (over 50 lines or so) then so be it. I certainly don't spend a lot of time worrying how long my subs are.

        Once again thanks. Tis a pitty you are anonymous or I would follow the nodes you write on a regular basis.

        One last thought though. I think this particular subject has different levels. Experienced programmers can probably argue against tillys advice. We have the knowledge to decide when to sacrfifice a rule of thumb like "keep your subroutine short" on the altar of pragmatism. But for a beginner I think the advice "keep your subroutines short and your methods shorter" is probably sound. Until they have developed a more sophisticated basis by which to decide the rule is probably a sound design/craft guideline. The more you know a subject the easier it is to decide "when to break the rules".

        So thanks to tilly for yet another thought provoking thread, and thanks to yourself for a bit of evidence of the contrary view.


        ---
        demerphq

          First they ignore you, then they laugh at you, then they fight you, then you win.
          -- Gandhi


Re: Short routines matter more in OO?
by Corion (Patriarch) on Oct 13, 2003 at 06:50 UTC

    I find short routines very helpfull, because they are usually easy to grasp and the routine name gives me a way to hold onto the concept and the function of the subroutine.

    Of course this breaks down as soon as the name is badly chosen or the subroutine has been artificially compressed/golfed to fit under the limit of "short".

    The smattering of one long flow of execution into small, seemingly unrelated bits of structure is the reason why I don't like event oriented/driven programming, like with POE, HTTP or X / GDI. To get any kind of responsiveness/interaction in these areas, one is forced to break every nontrivial loop into at least three states/subroutines/case slots, which are visually separate, which makes the control flow hard to follow, if the control flow can be observed at all.

    For classes, this problem can be simply avoided by underengineering and reducing the number of methods. Threads are a nice alternative to avoid this problem with event driven programming as long as you have only one process, and there have been some concepts of implementing stateful programming for HTTP through continuations.

    perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web
Re: Short routines matter more in OO?
by BrowserUk (Patriarch) on Oct 13, 2003 at 10:29 UTC

    I think the length of a routine, whether procedural, functional or OO, is less important than the maximum nesting level.

    To illustrate what I mean, consider this concocted ( and very possibly wrongly coded ) piece of C.

    And contrast that with

    Critiques of my (very rusty) C code, and whether anyone would actually code as in the first example (I know I did many moon ago, when "structured programming" was all the rage!) aside, hopefully, most people would agree that the second example is much easier to understand than the first although they do the same thing in essentially the same way.

    The main reason the second example is easier to read, IMO, is because there is a clear delineation between the two function of the code. The first block is parameter checking and error reporting. Once you move beyond that, you step cleanly into, and can concentrate your thoughts upon the primary purpose of the routine. By un-nesting the code, the length of the routine has barely changed, but the ability of the programmer to concentrate their thoughts has changed markedly. It becomes much easier to concentrate on a shorter section of the routine.

    In the first example, validating the error checking, becomes a nightmare task requiring that you page back and forth up and down the code, mentally aligning nesting levels. In the second, the need for either just disappears.

    I have seen advocations that once a small number of levels of nesting has been reached, the inner levels should be consigned to another layer of subroutine. Whilst this would allow the second section to be a short and easily readable routine, it only does so, if you dispense with parameter checking within that routine and rely on the level above. This is fraught with dangers, especially when, over time, it becomes necessary to add another parameter to the outer layer.

    In effect, what I am saying is that I don't mind if individual routines become grow in length, provided each screen sized section of the code is effectively a complete entity. A piece of code with a single, purpose. In this way, it becomes possible to get a good overview of the routine, and still see the detail. Breaking a routine into nested subroutines where there isn't a clear delineation between the subroutine and the calling code, forces the programmer to constantly page back and forth between 2 or more places in the code trying to get a mental grip on what is going on. Even with split screen capable editors, this creates considerable extra work and prevents a good overview. For a while, I used a 'folding editor' (LPEX (Update:corrected acronym)) which allowed you to click on a subroutine name in the calling code, and have it display the body of the subroutine or function in-situ. This was a interesting and powerful concept, and I liked it quite a lot, but it still made it difficult to keep ones bearings when navigating the twists and turns of the code.

    This is why I am an advocate of using the largest screen and the smallest font size (commensurate with my aging eye's) that I can. I always found using a large, folding map infinitely preferable to a book of maps for similar reasons. It is inevitable that with the folded (nested) view, the two details of primary interest span a page boundary forcing one to flip back and forth between them rather allowing you to concentrate on the encompassing span in a single view.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    Hooray!

OT: The Phenomenon of Life: The Nature of Order
by zby (Vicar) on Oct 13, 2003 at 10:41 UTC
    This is only a slightly connected to the topic, but has anyone read the recently fashonable in some circles oeuvre by Christopher Alexander "The Phenomenon of Life: The Nature of Order" ISBN 0972652914. It promisses some insight on the subject of software architecture. In particular it is said to identify 15 universal principles of design of natural things that should be present in human made architectures to be consistent.
Re: Short routines matter more in OO?
by DentArthurDent (Monk) on Oct 13, 2003 at 15:03 UTC
    I don't think it matters whether or not you're using OO. I think each function needs to do one particular task. If a subset of that task is needed by another task, then that subset needs to be factored out into a seperate function so that that code is unified for all consumers of that code.

    Let's generalize the long loop code issue and say that any block (for languages that have them) that extends beyond some arbitrary length limit particularly if it has many nested blocks should have it's guts factored out.

    ----
    If I melt dry ice can I swim without getting wet?
Re: Short routines matter more in OO?
by hsmyers (Canon) on Oct 13, 2003 at 14:31 UTC

    This is slightly OT, but I've found a useful seat of the pants metric for good/bad and in between is 'how hard is this code to maintain?' Given the number of long portions of code that I've had to chop up into smaller pieces my general inclination is to say 'short is mostly better'.

    --hsm

    "Never try to teach a pig to sing...it wastes your time and it annoys the pig."
Re: Short routines matter more in OO?
by bakunin (Scribe) on Oct 13, 2003 at 17:34 UTC
    I don't want to make any comments regarding efficiency. Rather, I want to take on the elegance factor. I have a couple of 200+ liners doing, I believe, a nice job, but whenever I look at them, I want to throw up.

    Half a dozen, maybe more nested loops,iterators all over, and gazillions of conditionals, together looking at me to with all their ugliness.

    And they are carrying out such a fragile and complicated task that even the idea of a small addition paralyses me with horror.

    Thank you tilly! I was expecting a push from someone to deal with this problem. I'm going to reconsider the design, and divide the subroutines.
Re: Short routines matter more in OO?
by jeorgen (Pilgrim) on Oct 14, 2003 at 09:14 UTC
    It's easier to test short focused methods than longer ones, because it's easier to set up the environment for testing (I suppose this is related to coupling) when you have fewer dependencies.

    On another note: If you ever had a conversation with a person whose cognitive style and abilities are different from yours you'll know just how different people think about and solve problems. As an example in a typical discussion I cannot keep more than three of the opponents arguments in my head at the same time. This has lead my friends to carefully tailor their style of arguing so they get the three most important arguments in first (or last depending on if I'm in FIFO or LIFO mode :-) knowing that the rest will be discarded. That is a clear example of discrepancy in cognitive abilities.

    A bunch of subroutines pose the same problem for me, while the layering and general spatial metaphors of OO saves the day.

    Some people probably have a cognitive style more suited to procedural thinking, which means we might not be able to agree on optimum length or paradigm.

    /jeorgen