Re: What is code readability?

For me, the single most important element of style affecting readability is "consistency".

Whitespace

When I first encountered the long lamented Abigail-II's code on PM, it looked strange to my eyes. And despite that there are quite a few of his guidelines that I do not follow, and disagree with his reasoning, I always find his code eminently readable. Even when that code is performing the often complex manipulations for which he is famous, the consistency of his code layout makes spending time exploring his code a joy.

And of all the stylistic elements of his code that make that so, his liberal--too liberal in a few places for my tastes--and consistent use of horizontal whitespace ranks very high on the list of things that make it so readable. In large part, this is the inspiration behind my liberal use of horizontal whitespace. I attempted to code with consistency of layout long before I encountered Perl, but I've modified my coding style since using Perl to incorporate more horizontal whitespace and this has reflected back into my coding in other languages.

Tokens

As is demonstrated by this quote

Aoccdrnig to a rscheearch at an Elingsh uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is that frist and lsat ltteer is at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae we do not raed ervey lteter by it slef but the wrod as a wlohe.

from Txet Maglning Glof, Ayobndy?, we don't parse writing, including code, letter by letter, but rather, token by token. So ensuring that the tokens in our code are clearly delineated is (IMO), the greatest single contribution to readability.

Indentation

The second area where consistency also applies is in indentation. Whilst I hate the significant whitespace aspect of languages that use it--because it means that the entire function of a piece of code can silently change through the accidental omission or deletion of an invisible character. I hate inconsistent indentation even more. Why anyone would code this

PP(pp_padhv) {
    dSP; dTARGET;
    I32 gimme;

    XPUSHs(TARG);
    if( PL_op->op_private & OPpLVAL_INTRO ) {
        SAVECLEARSV( PAD_SVl( PL_op->op_targ ) );
    }
    if( PL_op->op_flags & OPf_REF ) {
        RETURN;
    else if( LVRET ) {
        if( GIMME == G_SCALAR ) {
            Perl_croak(aTHX_ "Can't return hash to lvalue scalar conte
+xt");
        }
        RETURN;
    }
    gimme = GIMME_V;
    if( gimme == G_ARRAY ) {
        RETURNOP( do_kv() );
    }
    else if( gimme == G_SCALAR ) {
        SV* const sv = Perl_hv_scalar( aTHX_ (HV*)TARG );
        SETs( sv );
    }
    RETURN;
}
[download]

like this,

PP(pp_padhv)
{
    dSP; dTARGET;
    I32 gimme;

    XPUSHs(TARG);
    if (PL_op->op_private & OPpLVAL_INTRO)
    SAVECLEARSV(PAD_SVl(PL_op->op_targ));
    if (PL_op->op_flags & OPf_REF)
    RETURN;
    else if (LVRET) {
    if (GIMME == G_SCALAR)
        Perl_croak(aTHX_ "Can't return hash to lvalue scalar context")
+;
    RETURN;
    }
    gimme = GIMME_V;
    if (gimme == G_ARRAY) {
    RETURNOP(do_kv());
    }
    else if (gimme == G_SCALAR) {
    SV* const sv = Perl_hv_scalar(aTHX_ (HV*)TARG);
    SETs(sv);
    }
    RETURN;
}
[download]

is so far beyond my understanding that it's not even worth my trying. I tried to think of an appropriate analogy here, but everything I came up with would have offended somebody.

Historical justifications

This is also why I eshew many of the common coding practices and style guidelines. Unlike Abigail's, most of them do not come with justifications other than historic precedence. If history was such a great recommendation, we'd still write English in the style of Chaucer!

Best practice changes over time. And sticking with ancient practices, "because that's how it's always been done", doesn't make sense. When I first started programming, squared coding sheets, manually assigned, widely spaced line numbers were derigour.

When I first wrote code commercially, 64x20(or 23?) green-on-black vdus were just starting to become available. So pouring (or is that pawing) over huge stacks of green&white fanfold listings with a handful of coloured highlighters was a necessary part of my daily life.

With the advent of bigger, color screens, and syntax highlighting editors, I find that I have rarely printed a piece of code out for the last 10 or more years.

Things moved on and so did I.

Justifictions

Where they do come with justifications, these are often (IMO) wrongly argued. For example, the justifiction for preferring underscore_separated_variable_names to camelCaseVariableNames is that the former makes it easier to parse the individual words--which it probably does to some degree. But this is a wrongly argued justification. The problem is that it makes visually separate tokens of those individual words, which you don't want. As demonstrated above, the human brain/eyes recognises patterns/tokens not characters and words, so breaking singular tokens into multiple, visually separate elements is a bad thing, not a good one.

By way of simplistic demonstration, how many parameters are there in the following?

some_function (some_variable,some_other_variable,and_yet_another_variable,and_one_more_for,luck)

someFunction( someVariable, someOtherVariable, andYetAnotherVariable, andOneMoreFor, luck )

Did you catch it the first time?

Huffman encoding

Another justifiction is that for Huffman encoding of keywords. Huffman encoding does make sense, but I've variously seen this justified on the basis of being quicker to type, or quicker to read, but these miss the point.

Coders do not type at 60 wpm. And if they do, they produce bad code. I remember a metric from a very long time ago that the average programmer codes around 10-12 lines of code per day. Of course, this isn't just the time it takes to type the lines, it reflects design, debugging, maintenance etc. over the life of the code. But even if you sit down to type in a piece of code, the function of which is clear in your mind and the algorithm for which is well known and a part of your mental lexicon, you still will rarely achieve anything approaching secretarial typing speeds. You will pause for some amount of time to decide what to name each variable. You'll pause to decide whether map or grep or for or redo is appropriate for this particular piece of iteration. Should you use print and interpolated variables or printf and a template?

Likewise, once you become familiar with keywords and function names, even those in code you just picked up, it will make negligible difference to your parsing speed whether the it is for or foreach, or map or applyThisBlockOfCodeToEachElementOfThisList. Once you know what the function/variable/keyword does, you will not parse the spelling of the token. You'll simply recognise it--the token, short or long--as doing whatever it does.

So that leaves us with the question, what is the real value of Huffman encoding? (IMO), it is twofold:

Shorter is easier to remember--in context.
My justifications for this are:
- DANGER: KEEP OUT!
  Is more likely to have the desired affect than
  Within the area encircled by this metallic mesh barrier there exist localised high potential gradients that create the possibility for mis-endevour--including but not limited to burning, severe burning, maiming and termination of existence. You are accordingly advised that progressing inside the barrier could be hazardous.
- We use acronyms in preference to their expansions--once we know what the acronym means.
  TIMTOWTDI!
  Of course, when you encounter an acronym or abbreviation for the first time, it doesn't make sense. But once you are familiar with them, and you are operating in the right context, not using them makes no sense. If you're a fan of Grey's Anatomy, ER or any of the many other medical soaps, then can you imagine the crash team screaming
  He's going into ventricular fibrillation, so would someone be so kind as to get me 30 cubic centiliters of D3, 1,25-dihydroxy-20-epi-Vitamin(*). Oh, and if it's not imposing upon you to much, could you do that as a matter of great priority please.
  
  Instead of
  He's going into v-fib, get me 30cc of epi, stat.
  
  *I have no idea if that's the correct expansion of "epi" in this context, but it lent itself to the point I am making and in truth, I don't need to know.
Brevity == clarity.
The more frequently something is used, the shorter it should be, because it takes up less screen space.
That means that atomic elements of code can more often be coding in a single line. And that allows more discrete steps (preferably all of them), of an overall algorithm to be visible on a single screen. This is a huge, huge aid to understanding, both for the original author and the future maintenance programmer.

Long variable names

Another common (IMO mis-)perception is that long, descriptive variable names make for clearer code. This is only ever true for the first few minutes before you know what the variable is used for. After that, once you have internalised the purpose of a variable, they are just tokens. And, provided that their scope is suitably confined, the ability to recognise the token quickly and easily when scanning the code is inversely proportional to it's length.

Contrast

my @sorted = sort{ $a <=> $b } @names;
[download]

my @list_of_numerically_sorted_names = sort {
    $first_element_to_be_compared
    <=>
    $second_element to be compared
} @list_of_unsorted_names;
[download]

Of course, single character variable names are pretty useless if the life of the variable extends much beyond a few lines. But then most variables shouldn't have scopes that extend much beyond a few lines anyway--but that's a different discussion.

Whilst it is easy to argue that longer variable names allow the maintenance programmer (you, in a month or three's time), to quickly become familiar with the purpose of a variable when they first dive into a piece of code, (IMO) that is false economy. It (along with overly commented code), encourages a practice that I term 'hit & run' or 'guerrilla' maintenance. This is where the programmer receives a description of the problem, makes an assumption about the likely cause, dives into the middle of the code in question, reads a few comments or variable names and makes changes consistent with his earlier assumptions.

The problem is, descriptive variable names, like descriptive comments, describe what the original programmer thought they were coding. But the reason the maintenance programmer is in there, is because the code doesn't do what the original programmer thought it was doing. Of course, there are other reasons for maintenance than bugs, but I still feel that whenever you sit down to change a piece of code, you should understand what it does, and how it does it, not just what someone thought it should do, before you start making changes.

One of the ways I get to know a piece of code is to sit down and go through it changing the source-code layout to bring it to my preferred layout. I do this manually. I find that the process of inserting/adjusting whitespace, adjusting the indentation, and sometimes, even changing the variable names to fit my understanding of the code allows me to get a much clearer picture of the code at both the macro and micro levels.

Of course, this can offend some programmers and doesn't fit well with some source control and maintenance techniques, which means that once I have made my changes, I have to go back to the original sources and re-make them in the style of the original code if I intend to supply a patch for example. That could be seen as a problem, but it also serves as a secondary validation of the changes.

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

In Section Meditations