http://qs321.pair.com?node_id=116172

There have been a number of posts regarding coding style and coding efficiency. I'd like to weigh in on this topic.

The most efficient applications coding practice is the practice that leads to the most maintainable code.

Now, obviously, writing one-offs doesn't come under this. However, I'm not talking about one-offs. I'm talking about applications. Even applications as small as a two-form website. Anything you might want to revisit to make a single change in ... that's what's covered.

Now, maintainabilty is a very large topic, covering a number of coding practices. But, there are a few items which every piece of maintainable code has.

  • Factored / Orthogonal functions/scripts
  • Readable - this includes (but is not restricted to)
    • Descriptive variable names
    • Descriptive function names
    • Consistent indentation/bracketing (whatever style you choose)
    • Minimally sufficient commenting, explaining what variable names do not
    • Sufficient whitespace

I'll take each one separately.

Orthogonal functions are functions which have the following properties:

  • They do one thing
  • They do only one thing
  • They do that thing well
  • They do not depend on global variables

This leads to functions that are completely understandable in and of themselves. They do not depend on anything but that which they are given. This means that a function's purpose can be understood, then you never need to refer to it again. These are also known as atomic functions.

Factoring is the process by which one attains orthogonal functions. You "factor out" atomic concepts and make a function out of them.

You could very easily have a function that normalizes a name. Even if it's only called in that one place in your code, you still want to make a normalizeName() function. This means that your main loop has less clutter in it.

Readability. This is a topic that has a lot of people up in arms. However, there seem to be a few common sense things you can do to make your code more readable, and thus more maintainable. For example

my %hash1; #Hash for stuff my %hash2468; #hash for even nos my@blahblah; # Is this even necessary? my %someOtherKindOfHash; # Maybe stuff here my %h3; #you might want this hash, too.
That is a complete mess of a declaration of global variables. Yet, I see this over and over in code I'm given. Instead, why not do something like
# Predeclared hashes. Used throughout the script. my %hashForStuff; my %hashForEvenNumbers; my %hashForEvenMoreStuff; my %hashThatIThinkIMightNeed; # Is this even necessary? # my @blahblah;
I personally think that this is night and day. The useless comments to the right are gone, reducing the white noise. The variables tell you what they're for. There is a single comment giving useful information that the variable names do not. Even more, one section is for hashes and another for arrays. Breaking declarations up in some logical fashion, either by type of variable or what the variables will be used for, is extremely useful. It gives your reader more information.

The same goes for function names. If your function is determining the validity of a value, call it isValidValue(). This way, you end up with code that look like

if (isValidValue($num)) { # Do stuff here. } return undef unless isValidValue($num2);
It's almost like reading English! That is readable code.

Now for whitespace. Use whitespace intelligently. It not only tells your reader (who, most likely, will be you!) what things are separated from what, but also by how much. For example, if you use one line to separate while loops, that tells your reader that you're still in the same main concept. However, if you use three lines, that's telling the reader that you're shifting gears and doing something new.

I'll be writing another meditation on some of the Perlisms that can make your code much more readable.

Update: I agree completely with theorbtwo's comments regarding the variable naming. What I was trying to accomplish was to indicate that grouping like things and getting rid of the right-side comments would improve readability 10-fold. I should've taken it the next step and come up with good variable names. I got lazy. *winces* :-)

Also, as per hsmyers, Minimal changed to Sufficient above.

After discussions with tilly, Sufficient changed to "Minimally Sufficient". I feel that this is better than "Minimal" because minimal could imply "least possible", which may not convey all the necessary information.

------
We are the carpenters and bricklayers of the Information Age.

Don't go borrowing trouble. For programmers, this means Worry only about what you need to implement.

Replies are listed 'Best First'.
Re: Maintainable code is the best code
by Masem (Monsignor) on Oct 02, 2001 at 22:34 UTC
    Having just worked on a data analysis that involved orthogonality, I just want to point out that "orthogonal functions" does not necessarily mean functions do one task, etc.

    A set of orthogonal items means that between any two items, there is no overlap of their 'function'; for vectors in 3D Euclidean space, this means that no more than 3 vectors can be orthogonal to each other, and that for 3 vectors, they must be at right angles to each other.

    In programming, this corresponds to functions having no utility that overlap. That is, if you have a function that reads, parses, and prints data, and another one that just prints data, these are not orthogonal.

    Now, good programming practice, planning, and repeative refactoring are similar to mathematical tools that can help find the optimal orthogonal set of functions/vectors/whatever. That is, with programmming, several rounds of refactoring will help you to not only identify a set of orthoganal functions, but functions that also perform only small tasks such that they can be combined in some manner to do complex ones.

    So while the statement above is not *wrong* per se, it's not entirely accurate, and I'm just trying to clear that up :-)

    -----------------------------------------------------
    Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
    It's not what you know, but knowing how to find it if you don't know that's important

      Orthogonality... yeah, nice features. Since a few months I've been grinding orthogonality in the sense of principal component analysis (single value decomposition) - mind you, I'm not a mathematician, I just apply stuff on physiology.

      When I read your post Masem it occurred to me that good sets of functions are just the opposite of principal components. For you who are not familiar with this (intriguing) subject, principal components are a set of orthogonal axes of data that are chosen (like rotating and scaling the 3D euclidian space) in such a way that as much of the variability as possible goes into the first axis, from the remaining data as much as possible in the 2nd component etc etc. Moreover, all these components are orthogonal. Apart from the physiological applications I'm familiar with, there are also mathematical implications, like matrix rank, finding solutions and so on.

      A good set of functions is the opposite of principal components.

      You don't want as much functionality as possible crammed into a single function and all remaining bits scattered around a bunch of insignifant code. You want each function to have a clear and confined scope, so that each of the functions is as meaningfull and concise as possible without a lot of flags and parameters and other confusing stuff.

      I don't know the opposite of single value decomposition in a statistical sense, but it probably is an ill-posed problem. You can recombine orthogonal axes into an infinite number of other orthogonal axes (coordinate systems) but only one set forms the principal components. The reverse, equal division of information over all axes is possible in many different ways and therefore it is hard to come up with a solution (I figure, but I don't have hard proof for this... does anyone?).

      This compares nicely with writing code. There are a lot of ways to write orthogonal functions, but it is hard to divide the functional space equally over functions.

      Is there a name for this? Rural components maybe?

      Jeroen
      "We are not alone"(FZ)

        Actually, I was also approproaching orthogonality from a principle component (PCAnalysis) standpoint (though for experimental data analysis).

        Now to go over the heads of everyone else that has no idea what PCA is :-), the programming equivalent is that you have M 'overall functions' that your software will want to do. A good refactoring down to an orthogonal set in programming should result in N small functions, with N >> M. As jeroenes indicates, this is ill-defined from a PCA, as with PCA, you'd want to select a small number ( < M ) to approximate the job. However, unstated in the refactoring process is the fact that you should be thinking in the future and the past, and in reality, you might have P projects, each with M_sub_i (i = 1 to P) 'overall functions', such that the total of all functions over all projects past and present and future will result in M', with M' >> N >> M.

        In plain text, you should be refactoring to find an orthogonal set of functions that are reusable for other problems, including functions that might have been created already, and ones that might be part of future programs. This is the same conclusion the parent thread reaches as well as numerous other texts on programming, for for those with a mathematical bent, there's some empricalness to it as well.

        -----------------------------------------------------
        Dr. Michael K. Neylon - mneylon-pm@masemware.com || "You've left the lens cap of your mind on again, Pinky" - The Brain
        It's not what you know, but knowing how to find it if you don't know that's important

        I think that principle components analysis is the wrong way to think about this problem.

        First of all the analogy does not really carry. Principle components analysis depends on having some metric for how "similar" two vectors are which corresponds to the geometric "dot product". While many real world situations fit, and in many more you can fairly harmlessly just make one up, I don't think that code manages to fit this description very well.

        But secondly, even if the analogy did carry, the basic problem is different. Principle component analysis is about taking a complex multi-dimensional data structure and summarizing most of the information with a small number of numbers. The remaining information is usually considered to be "noise" or otherwise irrelevant. But a program has to remain a full description.

        Instead I think a good place to start thinking about this is Larry Wall's comment about Huffman coding in Apocalypse 3. That is an extremely important comment. As I indicated in Re (tilly) 3: Looking backwards to GO forwards, there is a connection between understanding well, and having mental models which are concise. And source-code is just a perfectly detailed mental model of how the program works, laid down in text.

        As observing the results of Perl golf will show you, shortness is not the only consideration for well laid-out programs. However it is an important one.

        So if laying out a program for conciseness matters, what does that tell us? Well basic information theory says a lot. In information theory, information is stated in terms of what could be said. The information in a signal is measured by how much it specified the overall message, that is how much it cut down the problem space of what you could be saying. This is a definition that depends more on what you could be saying more than what you are saying. Anyways from information theory, at perfect compression, every bit will carry just as much information about the overall message as any other bit. From a human point of view, some of those bits carry more important information. (The average color of a picture winter scene has more visual impact than the placement of an edge of a snowflake.) But the amount of information is evenly distributed.

        And so it is with programming. Well-written source-code is a textual representation of a model that is good for thinking about the problem. It will therefore be fairly efficient in its representation (although the text will be inefficient in ways that reduce the amount of information a human needs to understand the code). Being efficient, functions will convey a similar amount of surprise, and the total surprise per block is likely to be fairly large.

        In short, there will be a good compression in the following sense. A fixed human effort spend by a good programmer in trying to master the code, should be result in a relatively large portion of the system being understood. This is far from a compact textual representation. For instance the human eye finds overall shapes easy to follow, therefore it is good to have huge amounts of text be spent in allowing instant pattern recognition of the overall code structure. (What portion of your source-code is taken up with spaces whose purpose is to keep a consistent indentation/brace style?)

        Of course, though, some of that code will be high-order design, and some will be minor details. In terms of how much information is passed, they may be similar. But the importance differs...

Re: Maintainable code is the best code
by tstock (Curate) on Oct 02, 2001 at 22:48 UTC
    On this topic, I would like to recommend 3 books I liked:

    . Refactoring, by Martin Fowler
    . Writing Solid Code, by Steve Maguire
    . Software reuse Techniques, by Carma McClure

    They approach the issue at hand from different angles and add to each other nicely. Dont let "Microsoft Press" scare you away from the "Writting Solid Code" book <G>

Re: Maintainable code is the best code
by theorbtwo (Prior) on Oct 03, 2001 at 03:21 UTC
    my %hash1; #Hash for stuff my %hash2468; #hash for even nos my@blahblah; # Is this even necessary? my %someOtherKindOfHash; # Maybe stuff here my %h3; #you might want this hash, too.

    That is a complete mess of a declaration of global variables. Yet, I see this over and over in code I'm given. Instead, why not do something like

    # Predeclared hashes. Used throughout the script. my %hashForStuff; my %hashForEvenNumbers; my %hashForEvenMoreStuff; my %hashThatIThinkIMightNeed; # Is this even necessary? # my @blahblah;

    I like your post in general, but have one thing to say to this: Noooooo!

    Take a look at what you're replacing this with. It's even worse then what you started with. "hashFor" is just 7 characters that repeat "%". No gain except making every expresion that they appear in that much longer. And longer code is harder to understand then shorter code that does the same thing (in general). The mind boggles when it sees long things. The fact that much of the code isn't really doing anything but telling you what you already know doesn't come into effect until later. It's the same reason that obvious comments make unreadable code.

    A good replacement might be:

    # Globals my(%stuff, %morestuff); my %evens; #For even numbers # Some other stuff that I don't feel like manualy coppying # and then making a comment because mozilla's copying seems # to be broken (or perhaps I am)
    Everything is grouped by use, names are descriptive, but no longer then is acatualy useful.

    However, both of us are missing the real problem here. To quote Linus: "To call a global function 'foo' is a shooting offense". WTF is "stuff"? Telling the reader what stuff is is an absolute neccessity. You should always be able to tell what a varaible should contain, and with container varibles, by what it's indexed. If %evens is "for even numbers", why isn't it an array in which only even indeces are used? Perhaps "for even numbers" isn't really what the author meant, but rather "of even numbers".

    I reccomend everybody read Linus' coding standards. Not because I think Linus is a God, but because they're damm good coding standars. (If you don't have the kernel source on your box, BTW, look at the Linux Cross Reference.)

    Thanks,
    James Mastros,
    Just Another Perl Abbot
Re: Maintanable code is the best code
by Anarion (Hermit) on Oct 03, 2001 at 00:51 UTC
Re: Maintainable code is the best code
by runrig (Abbot) on Oct 03, 2001 at 00:36 UTC
    FYI, another way to clean up...
    my %hash1; #Hash for stuff my %hash2468; #hash for even nos my@blahblah; # Is this even necessary? my %someOtherKindOfHash; # Maybe stuff here my %h3; #you might want this hash, too.
    After running through perltidy:
    my %hash1; #Hash for stuff my %hash2468; #hash for even nos my @blahblah; # Is this even necessary? my %someOtherKindOfHash; # Maybe stuff here my %h3; #you might want this hash, too.
    The result in this case is maybe not entirely ideal, but is at least a good deal better...
Re: Maintainable code is the best code
by greywolf (Priest) on Oct 02, 2001 at 22:59 UTC
    I have to agree with what you say about naming conventions. They should not need comments.

    The key to any piece of code is to do 1 thing, do it well, and do it in only one place. If you have to cut-paste some functionality it should almost always be pulled out as its own function.

    mr greywolf
Re: Maintainable code is the best code
by hsmyers (Canon) on Oct 03, 2001 at 07:16 UTC
    First thing, if you're going to whip up a maxim, it should be The best code is maintainable. Better when you start out with a truism. Can't say as to how anything you've said is arguable—all good rules of thumb to follow, with one small quibble. You said:
    “Minimal commenting explaining what variable names do not”
    Would suggest you change Minimal to something a good deal stronger, say like Sufficient for instance. Small change, world of difference. If you make the change then it follows that the person who decides what sufficient, is the maintainer, not the programmer (skipping the obvious case.)

    hsm

      I strongly suggest not changing that word. The connotation with minimal is that you are avoiding having too much in the way of comments, and I think for working code directed at working programmers, that is correct.

      Follow the thread at Re (tilly) 2 (disagree): Another commenting question, for explanation of why it is important to be minimal. (As I always point out, behind the scenes we were trading /msgs that are unfortunately not public record, and were much friendlier.) Read that and get back to me on whether you think that minimal is the wrong word...

        My comment stands. Are you sure you understand what I said? Perhaps given the tone of the discussion you pointed towards you did not. I am sure that I need not point out what the word Sufficient means— if you percieved it to mean “every line” or some other such nonsense let me re-assure you that, that would be wrong. What I meant is precisely what I said—I said change the emphasis, I didn't say change the amount. If a program needs no comments then so be it, if not then adjust accordingly. As I have pointed out before, one of my techniques in taking over projects is to create a copy of the source without comments, since initially I prefer to see what was written, not what someone thought was written or something that was true once apon a time or any of the other problems that documentation is prone to.

        hsm