Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery
 
PerlMonks  

Re: What is code readability?

by BrowserUk (Patriarch)
on Jan 03, 2007 at 08:10 UTC ( [id://592715]=note: print w/replies, xml ) Need Help??


in reply to What is code readability?

For me, the single most important element of style affecting readability is "consistency".

Whitespace

When I first encountered the long lamented Abigail-II's code on PM, it looked strange to my eyes. And despite that there are quite a few of his guidelines that I do not follow, and disagree with his reasoning, I always find his code eminently readable. Even when that code is performing the often complex manipulations for which he is famous, the consistency of his code layout makes spending time exploring his code a joy.

And of all the stylistic elements of his code that make that so, his liberal--too liberal in a few places for my tastes--and consistent use of horizontal whitespace ranks very high on the list of things that make it so readable. In large part, this is the inspiration behind my liberal use of horizontal whitespace. I attempted to code with consistency of layout long before I encountered Perl, but I've modified my coding style since using Perl to incorporate more horizontal whitespace and this has reflected back into my coding in other languages.

Tokens

As is demonstrated by this quote

Aoccdrnig to a rscheearch at an Elingsh uinervtisy, it deosn't mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is that frist and lsat ltteer is at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae we do not raed ervey lteter by it slef but the wrod as a wlohe.

from Txet Maglning Glof, Ayobndy?, we don't parse writing, including code, letter by letter, but rather, token by token. So ensuring that the tokens in our code are clearly delineated is (IMO), the greatest single contribution to readability.

Indentation

The second area where consistency also applies is in indentation. Whilst I hate the significant whitespace aspect of languages that use it--because it means that the entire function of a piece of code can silently change through the accidental omission or deletion of an invisible character. I hate inconsistent indentation even more. Why anyone would code this

PP(pp_padhv) { dSP; dTARGET; I32 gimme; XPUSHs(TARG); if( PL_op->op_private & OPpLVAL_INTRO ) { SAVECLEARSV( PAD_SVl( PL_op->op_targ ) ); } if( PL_op->op_flags & OPf_REF ) { RETURN; else if( LVRET ) { if( GIMME == G_SCALAR ) { Perl_croak(aTHX_ "Can't return hash to lvalue scalar conte +xt"); } RETURN; } gimme = GIMME_V; if( gimme == G_ARRAY ) { RETURNOP( do_kv() ); } else if( gimme == G_SCALAR ) { SV* const sv = Perl_hv_scalar( aTHX_ (HV*)TARG ); SETs( sv ); } RETURN; }

like this,

PP(pp_padhv) { dSP; dTARGET; I32 gimme; XPUSHs(TARG); if (PL_op->op_private & OPpLVAL_INTRO) SAVECLEARSV(PAD_SVl(PL_op->op_targ)); if (PL_op->op_flags & OPf_REF) RETURN; else if (LVRET) { if (GIMME == G_SCALAR) Perl_croak(aTHX_ "Can't return hash to lvalue scalar context") +; RETURN; } gimme = GIMME_V; if (gimme == G_ARRAY) { RETURNOP(do_kv()); } else if (gimme == G_SCALAR) { SV* const sv = Perl_hv_scalar(aTHX_ (HV*)TARG); SETs(sv); } RETURN; }

is so far beyond my understanding that it's not even worth my trying. I tried to think of an appropriate analogy here, but everything I came up with would have offended somebody.

Historical justifications

This is also why I eshew many of the common coding practices and style guidelines. Unlike Abigail's, most of them do not come with justifications other than historic precedence. If history was such a great recommendation, we'd still write English in the style of Chaucer!

Best practice changes over time. And sticking with ancient practices, "because that's how it's always been done", doesn't make sense. When I first started programming, squared coding sheets, manually assigned, widely spaced line numbers were derigour.

When I first wrote code commercially, 64x20(or 23?) green-on-black vdus were just starting to become available. So pouring (or is that pawing) over huge stacks of green&white fanfold listings with a handful of coloured highlighters was a necessary part of my daily life.

With the advent of bigger, color screens, and syntax highlighting editors, I find that I have rarely printed a piece of code out for the last 10 or more years.

Things moved on and so did I.

Justifictions

Where they do come with justifications, these are often (IMO) wrongly argued. For example, the justifiction for preferring underscore_separated_variable_names to camelCaseVariableNames is that the former makes it easier to parse the individual words--which it probably does to some degree. But this is a wrongly argued justification. The problem is that it makes visually separate tokens of those individual words, which you don't want. As demonstrated above, the human brain/eyes recognises patterns/tokens not characters and words, so breaking singular tokens into multiple, visually separate elements is a bad thing, not a good one.

By way of simplistic demonstration, how many parameters are there in the following?

some_function (some_variable,some_other_variable,and_yet_another_variable,and_one_more_for,luck)

someFunction( someVariable, someOtherVariable, andYetAnotherVariable, andOneMoreFor, luck )

Did you catch it the first time?

Huffman encoding

Another justifiction is that for Huffman encoding of keywords. Huffman encoding does make sense, but I've variously seen this justified on the basis of being quicker to type, or quicker to read, but these miss the point.

Coders do not type at 60 wpm. And if they do, they produce bad code. I remember a metric from a very long time ago that the average programmer codes around 10-12 lines of code per day. Of course, this isn't just the time it takes to type the lines, it reflects design, debugging, maintenance etc. over the life of the code. But even if you sit down to type in a piece of code, the function of which is clear in your mind and the algorithm for which is well known and a part of your mental lexicon, you still will rarely achieve anything approaching secretarial typing speeds. You will pause for some amount of time to decide what to name each variable. You'll pause to decide whether map or grep or for or redo is appropriate for this particular piece of iteration. Should you use print and interpolated variables or printf and a template?

Likewise, once you become familiar with keywords and function names, even those in code you just picked up, it will make negligible difference to your parsing speed whether the it is for or foreach, or map or applyThisBlockOfCodeToEachElementOfThisList. Once you know what the function/variable/keyword does, you will not parse the spelling of the token. You'll simply recognise it--the token, short or long--as doing whatever it does.

So that leaves us with the question, what is the real value of Huffman encoding? (IMO), it is twofold:

  • Shorter is easier to remember--in context.

    My justifications for this are:

    • DANGER: KEEP OUT!

      Is more likely to have the desired affect than

      Within the area encircled by this metallic mesh barrier there exist localised high potential gradients that create the possibility for mis-endevour--including but not limited to burning, severe burning, maiming and termination of existence. You are accordingly advised that progressing inside the barrier could be hazardous.

    • We use acronyms in preference to their expansions--once we know what the acronym means.

      TIMTOWTDI!

      Of course, when you encounter an acronym or abbreviation for the first time, it doesn't make sense. But once you are familiar with them, and you are operating in the right context, not using them makes no sense. If you're a fan of Grey's Anatomy, ER or any of the many other medical soaps, then can you imagine the crash team screaming

      He's going into ventricular fibrillation, so would someone be so kind as to get me 30 cubic centiliters of D3, 1,25-dihydroxy-20-epi-Vitamin(*). Oh, and if it's not imposing upon you to much, could you do that as a matter of great priority please.

      Instead of

      He's going into v-fib, get me 30cc of epi, stat.

      *I have no idea if that's the correct expansion of "epi" in this context, but it lent itself to the point I am making and in truth, I don't need to know.

  • Brevity == clarity.

    The more frequently something is used, the shorter it should be, because it takes up less screen space.

    That means that atomic elements of code can more often be coding in a single line. And that allows more discrete steps (preferably all of them), of an overall algorithm to be visible on a single screen. This is a huge, huge aid to understanding, both for the original author and the future maintenance programmer.

Long variable names

Another common (IMO mis-)perception is that long, descriptive variable names make for clearer code. This is only ever true for the first few minutes before you know what the variable is used for. After that, once you have internalised the purpose of a variable, they are just tokens. And, provided that their scope is suitably confined, the ability to recognise the token quickly and easily when scanning the code is inversely proportional to it's length.

Contrast

my @sorted = sort{ $a <=> $b } @names;
my @list_of_numerically_sorted_names = sort { $first_element_to_be_compared <=> $second_element to be compared } @list_of_unsorted_names;

Of course, single character variable names are pretty useless if the life of the variable extends much beyond a few lines. But then most variables shouldn't have scopes that extend much beyond a few lines anyway--but that's a different discussion.

Whilst it is easy to argue that longer variable names allow the maintenance programmer (you, in a month or three's time), to quickly become familiar with the purpose of a variable when they first dive into a piece of code, (IMO) that is false economy. It (along with overly commented code), encourages a practice that I term 'hit & run' or 'guerrilla' maintenance. This is where the programmer receives a description of the problem, makes an assumption about the likely cause, dives into the middle of the code in question, reads a few comments or variable names and makes changes consistent with his earlier assumptions.

The problem is, descriptive variable names, like descriptive comments, describe what the original programmer thought they were coding. But the reason the maintenance programmer is in there, is because the code doesn't do what the original programmer thought it was doing. Of course, there are other reasons for maintenance than bugs, but I still feel that whenever you sit down to change a piece of code, you should understand what it does, and how it does it, not just what someone thought it should do, before you start making changes.

One of the ways I get to know a piece of code is to sit down and go through it changing the source-code layout to bring it to my preferred layout. I do this manually. I find that the process of inserting/adjusting whitespace, adjusting the indentation, and sometimes, even changing the variable names to fit my understanding of the code allows me to get a much clearer picture of the code at both the macro and micro levels.

Of course, this can offend some programmers and doesn't fit well with some source control and maintenance techniques, which means that once I have made my changes, I have to go back to the original sources and re-make them in the style of the original code if I intend to supply a patch for example. That could be seen as a problem, but it also serves as a secondary validation of the changes.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: What is code readability?
by wfsp (Abbot) on Jan 03, 2007 at 12:37 UTC
    My take.

    Vertical white space rules!

    some_function ( if_it_wasnt_for, bad_luck, I_wouldnt_have, any_luck_at_all, ) or die "handy place for your error msg\n";
    Short names, long names, I don't care. Just make them meaningful. A few moments spent choosing a good name can save hours of misunderstanding (even your own) code later.

    Code reuse is a good thing. I like variable name reuse too. :-)

Re^2: What is code readability?
by Herkum (Parson) on Jan 03, 2007 at 13:59 UTC

    While on the subject of consistency, one should avoid doing having code that does more extra work that is not directly related for why it was written. It should be like a conversation with parts of a conversation focusing upon the subject being discussed.

    Example: BrowserUK wrote this, my wife bought a new car today and I hate my co-workers, excellent article and I enjoyed reading it.

    This is a tough to read but is common problem among programmers.

    A more practical example would be a function that mix SQL with business logic with HTML. Doing all three at the same time makes it hard to debug and reduces the portability of the code. You cannot,

    • Use the code outside of a program that does not need HTML.
    • Use the code outside of a database because you only need the business logic.
    • Use the code to just get data because the business logic is invalid in that situation.

    Small and focused code should help to provide Consistency

Re^2: What is code readability?
by BrowserUk (Patriarch) on Jan 03, 2007 at 20:28 UTC

    It was pointed out to me by /msg that in my "How many arguments?" example above, I made two changes between the two examples, and it was postulated that only one of these, the additional whitespace, contributed to whatever extra clarity was evident.

    By way of further investigation, in which of the following is the number of parameters clearest to your eyes?

    • someFunction( someVariable, someOtherVariable, andYetAnotherVariable, andOneMoreFor, luck )
    • some_function (some_variable,some_other_variable,and_yet_another_variable,and_one_more_for,luck)
    • someFunction (someVariable,someOtherVariable,andYetAnotherVariable,andOneMoreFor,luck)
    • some_function( some_variable, some_other_variable, and_yet_another_variable, and_one_more_for, luck )

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      The last one, definitely. :-)

      1st & 4th, my favourite one BTW, are much easier on the eyes.

      There is one more variation ...

      some_function ( some_variable , some_other_variable , and_yet_another_variable , and_one_more_for , luck )

        Ug :) I really hate that. I see no benefit in spaces both sides of the commas, or breaking the open paren away from the function name. And if you have to break the params across lines, at least balance their lengths :)

        some_function( some_variable, some_other_variable, and_yet_another_variable, and_one_more_for, luck );

        That's not so bad for simple (void) calls as, but when you're retrieving data and checking you get something like

        if( some_return_value = some_function( some_variable, some_other_variable, and_yet_another_variable, and_one_more_for, luck ) ) { // do some stuff here with some_return_value } else { // report or otherwise handle the error }

        It's just a mess.The best I've come up with for this is

        if( some_return_value = some_function( some_variable, some_other_variable, and_yet_another_variable, and_one_more_for, luck ) ) { // do some stuff here with some_return_value } else { // report or otherwise handle the error }

        Which ain't great, but is better than most alternatives to my eyes.

        And much better still is

        if( someRv = fSome( some, oSome, AYAnother, OneMoreFor, luck )) { // do some stuff here with someRv } else { // report or otherwise handle the error }

        With the point being that whilst ths abbreviated variable names don't immediately make much sense, by the time a programmer has got familiar enough with the code to consider making changes, they will.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        There is one more variation

        Only one?

        some_function ( some_variable , some_other_variable , and_yet_another_variable , and_one_more_for , luck , ) some_function ( some_variable , some_other_variable , and_yet_another_variable , and_one_more_for , luck )
        A word spoken in Mind will reach its own level, in the objective world, by its own weight
          A reply falls below the community's threshold of quality. You may see it by logging in.
        some_function(some_variable, some_other_variable, and_yet_another_variable, and_one_more_for, luck);
        ...getting better...
        some_function( filename => $some_variable, address => $some_other_variable, amount => $and_yet_another_variable, phone_num => $and_one_more_for, diameter => $luck );

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://592715]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (6)
As of 2024-04-18 13:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found