Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

My coding guidelines

by Abigail-II (Bishop)
on Nov 25, 2002 at 17:15 UTC ( [id://215675] : perlmeditation . print w/replies, xml ) Need Help??

Last week, hakkr posted some coding guidelines which I found to be too restrictive, and not addressing enough aspects. Therefore, I've made some guidelines as well. These are my personal guidelines, I'm not enforcing them on anyone else.
  1. Warnings SHOULD be turned on.

    Turning on warnings helps you finding problems in your code. But it's only useful if you understand the messages generated. You should also know when to disable warnings - they are warnings after all, pointing out potential problems, but not always bugs.

  2. Larger programs SHOULD use strictness.

    The three forms of strictness can help you to prevent making certain mistakes by restricting what you can do. But you should know when it is appropriate to turn off a particular strictness, and regain your freedom.

  3. The return values of system calls SHOULD be checked.

    NFS servers will be down, permissions will change, file will disappear, disk will fill up, resources will be used up. System calls can fail for a number of reasons, and failure is not uncommon. Programs should never assume a system call will succeed - they should check for success and deal with failures. The rare case where you don't care whether the call succeeded should have a comment saying so.

    All system calls should be checked, including, but not limited to, close, seek, flock, fork and exec.

  4. Programs running on behalf of someone else MUST use tainting; Untaining SHOULD be done by checking for allowed formats.

    Daemons listening to sockets (including, but not limited to CGI programs) and suid and sgid programs are potential security holes. Tainting can help securing your programs by tainting data coming from untrusted sources. But it's only useful if you untaint carefully: check for accepted formats.

  5. Programs MUST deal with signals appropriately.

    Signals can be sent to the program. There are default actions - but they are not always appropriate. If not, signal handlers need to be installed. Care should be taken since not everything is reentrant. Both pre-5.8.0 and post-5.8.0 have their own issues.

  6. Programs MUST deal with early termination appropriately.

    END blocks and __DIE__ handlers should be used if the program needs to clean up after itself, even if the program terminates unexpectedly - for instance due to a signal, an explicite die or a fatal error.

  7. Programs MUST have an exit value of 0 when running succesfully, and a non-0 exit value when there's a failure.

    Why break a good UNIX tradition? Different failures should have different exit values.

  8. Daemons SHOULD never write to STDOUT or STDERR but SHOULD use the syslog service to log messages. They should use an appropriate facility and appropriate priorities when logging messages.

    Daemons run with no controlling terminal, and usually its standard output and standard error disappear. The syslog service is a standard UNIX utility especially geared towards daemons with a logging need. It allows the system administration to determine what is logged, and where, without the need to modify the (running) program.

  9. Programs SHOULD use Getopt::Long to parse options. Programs MUST follow the POSIX standard for option parsing.

    Getopt::Long supports historical style arguments (single dash, single letter, with bundling), POSIX style, and GNU extensions. Programs should accept reasonable synonymes for option names.

  10. Interactive programs MUST print a usage message when called with wrong, incorrect or incomplete options or arguments.

    Users should know how to call the program.

  11. Programs SHOULD support the --help and --version options.

    --help should print a usage message and exit, while--version should the version number of the program.

  12. Code SHOULD have an exhaustive regression test suite.

    Regression tests help catch breakage of code. The regression tests should 'touch' all the code - that is, every piece of code should be executed when running the regression suite. All border should be checked. More tests is usually better than less test. Behaviour on invalid inputs needs to be tested as well.

  13. Code SHOULD be in source control.

    And a code source control tool will take care of keeping track of a history or changes log, version numbers and who made the most recent change(s).

  14. All database modifying statements MUST be wrapped inside a transaction.

    Your data is likely to be more important than the runtime or codesize of your program. Data integrety should be retained at all costs.

  15. Subroutines in standalone modules SHOULD perform argument checking and MUST NOT assume valid arguments are passed.

    Perl doesn't compile check the types of or even the number of arguments. You will have to do that yourself.

  16. Objects SHOULD NOT use data inheritance unless it is appropriate.

    This means that "normal" objects, where the attributes are stored inside anonymous hashes or arrays should not be used. Non-OO programs benefit from namespaces and strictness, why shouldn't objects? Use objects based on keying scalars, like fly-weight objects, or inside-out objects. You wouldn't use public attributes in Java all over the place either, would you?

  17. Comment SHOULD be brief and to the point.

    If you need lots of comments to explain your code, you may consider rewriting it. Subroutines that have a whole blob of comments describing arguments are return values are suspect. But do document invariants, pre- and postconditions, (mathematical) relationships, theorems, observations and other relevant things the code assumes. Variables with a broad scope might warrant comments too.

  18. POD SHOULD not be interleaved with the code, and is not an alternative for comments.

    Comments and POD have two different purposes. Comments are there for the programmer. The person who has to maintain the code. POD is there to create user documentation from. For the person using the code. POD should not be interleaved with the code because this makes it harder to find the code.

  19. Comments, POD and variable names MUST use English.

    English is the current Lingua Franca.

  20. Variables SHOULD have an as limited scope as is appropriate.

    "No global variables", but better. Just disallowing global variables means you can still have a loop variant with a file-wide scope. Limiting the scope of variables means that loop variants are only known in the body of the loop, temporary variables only in the current block, etc. But sometimes it's useful for a variable to be global, or have a file-wide scope.

  21. Variables with a small scope SHOULD have short names, variables with a broad scope SHOULD have descriptive names.

    $array_index_counter is silly; for (my $i = 0; $i < @array; $i ++) { .. } is perfect. But a variable that's used all over the place needs a descriptive name.

  22. Constants (or variables intended to be constant) SHOULD have names in all capitals, (with underscores separating words), so SHOULD IO handles. Package and class names SHOULD use title case, while other variables (including subroutines) SHOULD use lower case, words separated by underscores.

    This seems to be quite common in the Perl world.

  23. Custom delimiters SHOULD be tall and skinny.

    /, !, | and the four sets of braces are acceptable, #, @ and * are not. Thick delimiters take too much attention. An exception is made for:  q $Revision:$, because RCS and CVS scan for the dollars.

  24. Operators SHOULD be separated from their operands by whitespace, with a few exceptions.

    Whitespace increases readability. The exceptions are:

    • Unary +, -, \, ~ and !.
    • No whitespace between a comma and its left operand.

    Note that there is whitespace between ++ and -- and their operands, and between -> and its operands.

  25. There SHOULD be whitespace between an indentifier and its indices. There SHOULD be whitespace between successive indices.

    Taking an index is an operation as well, so there should be whitespace. Obviously, we cannot apply this rule in interpolative contexts.

  26. There SHOULD be whitespace between a subroutine name and its parameters, even if the parameters are surrounded by parens.

    Again, readability.

  27. There SHOULD NOT be whitespace after an opening parenthesis, or before a closing parenthesis. There SHOULD NOT be whitespace after an opening indexing bracket or brace, or before a closing indexing bracket or brace.

    That is: $array [$key], $hash {$key} and sub ($arg).

  28. The opening brace of a block SHOULD be on the same line as the keyword and the closing brace SHOULD align with the keyword, but short blocks are allowed to be on one line.

    This is K&R style bracing, except that we require it for subroutines as well. We do allow map {$_ * $_} @args to be on one line though.

  29. No cuddled elses or elsifs. But the while of a do { } while construct should be on the same line as the closing brace.

    It just looks better that way! ;-)

  30. Indents SHOULD be 4 spaces wide. Indents MUST NOT contain tabs.

    4 spaces seems to be an often used compromise between the need to make indents stand out, and not getting cornered. Tabs are evil.

  31. Lines MUST NOT exceed 80 characters.

    There is just no excuse for that. More than 80 characters means it will wrap in too many situations, leading to hard to read code.

  32. Align code vertically.

    This makes code look more pleasing, and it brings attention to the fact similar things are happening on close by lines. Example:

    my $var = 18; my $long_var = "Some text";
This is just a first draft. I've probably forgotten some rules.


Replies are listed 'Best First'.
Re: My coding guidelines
by mojotoad (Monsignor) on Nov 25, 2002 at 18:12 UTC
    Excellent list. It's not often you see style guidelines along with some informative prose as to the why of the guidelines.

    I was surprised by this part (24):

    Note that there is whitespace between ++ and -- and their operands, and between -> and its operands.

    What's the consensus on this one? I've always liked snug ++/-- and dereferencing:

    ++$wolverines; $rabbits--; $wombat->move(); $turtle = $menagerie->[4];

    As opposed to:

    ++ $wolverines; $rabbits --; $wombat -> move(); $turtle = $menagerie -> [4];

    For that matter, I've always made unary operators snug so that you can immediately find it's operand. This might be a holdout from C, but I do actually find the snug form more aesthetically pleasing.

    On a somewhat visually related note, the arrow form of the left-associative operator works well with whitespace, however:

    %myhash = ( lizard => 'gecko' );

    Whereas the equivalent comma operator is snug with the left operand.

    Thanks again for the list,

      Don't say always too fast. Are you sure you always snug unary operators? Including not?


        Since I never use 'not' in my code ... yes. ;)

        I take your point, however -- there are indeed exceptions. I think "english language" unary operators might be a whole class of exceptions since they parse (to us) as English rather than code.

        By the way, why is '->' a binary operator? From perldoc:

        "->" is an infix dereference operator, just as it is in C and C++. If the right side is either a ..., {...}, or a (...) subscript, then the left side must be either a hard or symbolic reference to an array, a hash, or a subroutine respectively. (Or technically speaking, a location capable of holding a hard reference, if it's an array or hash reference being used for assignment.) See perlreftut and perlref. Otherwise, the right side is a method name or a simple scalar variable containing either the method name or a subroutine reference, and the left side must be either an object (a blessed reference) or a class name (that is, a package name). See perlobj.

        Seems like dereferencing is dereferencing is dereferencing, no RHS required. Is there an association going on as well?


Re: My coding guidelines
by fruiture (Curate) on Nov 25, 2002 at 18:51 UTC

    I think it's no great use making guidelines in general. You'll have to agree on guidelines in one project and it's good to have concrete ideas. The actual decisions will depend on the group.

    Some of the things you mention are in my eyes means of sane programming (system calls, exit code). I don't think you can work seriously with someone who has to be told these things. Also, I'd never say "SHOULD". Say "MUST", so first of all there is no doubt: "Use strict! No exceptions. That's it." Someone who _really_ knows when to disable strictness, will do it. Someone who doesn't will struggle for a solution and ask for help and then someone else may tell him "This is the exception, you may disable strict here."

    So there are some general rules i'd make (and in general i agree with you):

    1. use strict! Always.
    2. use warnings! Always.
    3. Doing CGI? use! Always.
    4. Parsing commandline options? use Getopt::Long and Configure() it POSIXLY_CORRECT! Always.
    5. check system calls! Always.
    6. Use comments to comment, use POD to document. Don't mix these two.
    7. Avoid Globals!
    8. Globals are all variables with a scope that is larger (in the sourcecode) than an average screen.
    9. Name Globals wisely. Follow perlstyle rules
    10. Use spaces to make things clearer

    These are the rules I consider "general". Other things are matter of personal taste but should be equal inside a project:

    1. open the curly on same line as keyword. Indent next line.
    2. closing curly on single line, indented as deep as keyword of opening curly.
    3. curly-rule exceptions are special short blocks (map,grep,sort)
    4. No space between function name and opening parens.
    5. No space between array identifier and opening square bracket.
    6. No space between hash identifier and opeing curly

    Now finally the Tab-issue: I really think that one should use tabs, but only for indentation. Don't use tabs for alignement.

    if( $a > $b ){ T---return T---T---cond1 ? val1 : T---T---cond2 ? val2 : T---T--- default T---; }

    This way your alignement won't break in someone else's editor using a different tab-width, but logical indentation is still present. I am really convinced that this is a solution.

      Also, I'd never say "SHOULD". Say "MUST", so first of all there is no doubt: "Use strict! No exceptions.
      And I was just going to commend Abigail for using SHOULD for a lot of the guidelines.

      They are guidelines after all - and there are situations where this or that rule is not appropriate. The programmer should not have to go through hoops to code around the limitation imposed by the rule (but should provide appropriate reasoning why the rule should not be applied).


      I object to those "always" you are using in your list. You wouldn't be able to create useful modules like Exporter and Memoize, if you always had to use strict. Same with warnings. Perl generates warnings if it things the programmer makes an error. But sometimes, Perl is wrong. The right thing to do is turn warnings off, instead of having to go through hoops to satisfy Perl.

      As for point 3, I couldn't agree less. is a monster, doing far more than just CGI (come on, producing HTML shouldn't be a part of a module calling itself CGI). In the rare cases I do any CGI programming, I tend to avoid it like the plague.

      As for you tab issue, it's a good example why tabs should be avoided. Your example indicates a tabwidth of 4. Any idea how that's going to look like on a terminal with a default tabstop of 8? Sure, the left indents will all be lined up, but can you garantee none of the lines will exceed 80 characters? Any idea what's going to happen if one of the tabs is replaced by 4 spaces, and someone using tabstops of 8 is going to view it? It won't align correctly anymore.

      Tabs are evil. The savings in disk space and I/O might have been significant 30 years ago, but they aren't nowadays. With spaces, the indentation will always look the way the author intended it to be (given fixed width fonts). With tabs, it will only look right if all authors and viewers used the same tabstops.


        No, this is a misunderstanding: The rule says "you must", because it doesn't want to leave any doubts to somebody who doesn't know enough to break it with sense. I know that if i want to mess around in some symbol tables (e.g. Export or an Accessor Method Creator...) I will turn off strict 'refs' as well, and I want anybody else to do that then. But someone who isn't sure, must not break the rule.

        Same with There's a "must", because you should not question until you know better.

        Tabs: This is not about disk space, but it's no use disuccing this. Tabs aren't evil. :)

        To clarify: whom are you making these rules for? Do you need to tell yourself "use strict, you(=I) know when I may disable it!"? Such guidelines are usefull in two ways: a) for people who learn Perl, for them I wrote the "MUST". To say "SHOULD" tells them there are exceptions, but there are no exceptions they need to know about yet. b) For a project, to keep it all consistent and easier to make many people work on the same Code. There the "MUST" fits because an exception should be rare and known, because it is an exception. And: I wouldn't work with people who need to be told to use "strict" and when to disable it.


      Consering strict, I use the following rule:

      Always use strict. If you can't do something with strict, think about if you really need to do it without strict, and if you really need to, disable the part of strict locally and explain why this has to be done, e.g.
      if ( ... ) { # disabling strict refs temporary because of blabla..... no strict 'refs'; *{ $AUTOLOAD } = $codeReference; use strict 'refs'; }
      ++ Abigail-II

      Best regards,
      perl -e "s>>*F>e=>y)\*martinF)stronat)=>print,print v8."

        That's exactly right. use strict always does not mean you shouldn't no strict when appropriate :)
Re: My coding guidelines
by Courage (Parson) on Nov 25, 2002 at 19:36 UTC
    I am not agreed with item (25):
    There SHOULD be whitespace between an indentifier and its indices. There SHOULD be whitespace between successive indices.
    I see everywhere $array[$index] or $hash[$index] without any whitespaces and accustomed to this. IMHO Most perl core and CPAN modules do not satisfy that rule.

    I see no readability problems with such a very standard situation of array or hash indicies.

    I do agree on all other items though.

    Courage, the Cowardly Dog

      I know many people will disagree with rule 25. But they are my guidelines, and that's the way I code. Not just in Perl, but any language that I can remember programming in allows whitespace between an indentifier and its indexing operation. Including Python. (Unfortunally, Perl 6 will break decades of tradition).

      The reason is that the eye needs resting points. Whitespace makes it easier to divide a line of text into chunks and read it. As well as in natural languages as in code. $foo{bar}[17]{baz} is one big blob, and it's hard to divide it into its 4 chunks, specially when the subscripts are a bit more complex. We don't have the tendency to chain words together in English (unlike in for instance German). Why should we with code?


        Funny, I tend to put the whitespace inside the curlies and add extra arrows in that case, as in $foo{ $something_really_long }->{ and_rather($complicated)->{here} } which I find more helpful than $foo {$something_really_long} {and_rather($complicated)->{here}}

        Makeshifts last the longest.

Re: My coding guidelines
by grantm (Parson) on Nov 25, 2002 at 22:49 UTC
    10. Interactive programs MUST print a usage message when called with wrong, incorrect or incomplete options or arguments.

    I find the Pod::Usage module to be excellent for this purpose and always combine it with Getopt::Long.

    Number 24 seems weird to me also.

      I wonder what is so strange about rule 24. The perlstyle manual page says: space around most operators. Which is what rule 24 is saying, but it's listing the exceptions. The reason I put space between ++ and -- and its operands is that those operators are "large" (two characters).

      As for ->, it's a binary operator. I put whitespace around other binary operators, so why not ->? If you don't, you can get long blobs of text, without a resting place for the eyes. Perl should look like English, not German. ;-).


        But English is only my third language - while German is my second! :-) (Well, maybe you'd prefer me to code in Greek. :^) )

        Makeshifts last the longest.

Re: My coding guidelines
by derby (Abbot) on Nov 25, 2002 at 18:53 UTC
    Abigail-II, great list. I agree with most of them (rule #24 is just a bit odd for me). Also, I understand rule #16 but don't understand the reasoning (maybe that means I don't understand the rule). Any chance of talking you into expounding on that some more?


      #16 is about the fact that if you reference a non-existance hash key...nothing happens. Better to use the options detailed in the OP so that "mis"-references are trapped.


Re: My coding guidelines
by cLive ;-) (Prior) on Nov 25, 2002 at 18:56 UTC
    Re point 15:

    "Perl doesn't compile check the types of or even the number of arguments"

    Am I missing something, or isn't that what prototypes are for?


    cLive ;-)

      You are missing something, here are two examples:
      # prototypes only check arg format # i.e. (scalar/array/hash/code/glob and number of args) package Sample; sub stringify () { return ($_[0]->{data}) x $_[1] } package main; sub print_string($) { print $_[0]->stringify(1) } my $obj = bless ( {data=>"foo\n"}, 'Sample'); print_string($obj); # fine print_string(5); # uh oh... basic numbers don't have a stringify me +thod, # yet the arg passed the prototype since 5 is a sc +alar.
      And also:
      # methods don't check prototypes... at all. sub print_string_bad($) { print $_[0]->stringify() # Will pass the prototype, yet break since $_ +[1] will be # undef in stringify } print_string_bad($obj);

      Hence the reason for the general distaste for prototypes among the perl community. For most cases, they're pointless. They're only there so that you can call your own subs like perl-builtins.

      You can't use prototypes to have the compiler check whether an argument is an integer, or an object of a certain type. Prototypes are less useful than you might think at first, and they are sometimes bloody nasty. The following doesn't do what you want:
      sub foo ($$) {....} my @bar = (1, 2); foo @bar;
      &foo isn't called with two arguments - even while @bar has two elements. Remove the prototype, and it will work as expected.


Re: My coding guidelines
by DamnDirtyApe (Curate) on Nov 25, 2002 at 23:41 UTC

    I know it's been kicked around the monastery before, but I'd like to know Abigail-II's take on receiving arguments in subs. Do you tend to more towards $x = shift; $y = shift; ..., or ($x, $y) = @_;, or some combination of the two, depending on circumstances?

    Those who know that they are profound strive for clarity. Those who
    would like to seem profound to the crowd strive for obscurity.
                --Friedrich Nietzsche
      It depends. Sometimes I use shift, sometimes list assignment. And often I prefer named parameters - sometimes in combination with shifts.


      DamnDirtyApe - take a peek at this node of mine for some interesting insight on receiving arguments in subs.

      -- vek --
        Personally I never pass arrays to subs just array references or hashes. This save you from adding extra shifts or assigning extra variables when you add a value to be be passed in. So you don't have to  my $x=$_[0]; my $y=shift;.

        The same goes for returning values from subs, returning a list just causes hassle of typing out assignments for each value in the list.

Re: My coding guidelines
by Anonymous Monk on Nov 25, 2002 at 19:37 UTC
    18. POD SHOULD not be interleaved with the code, and is not an alternative for comments.

    Comments and POD have two different purposes. Comments are there for the programmer. The person who has to maintain the code. POD is there to create user documentation from. For the person using the code. POD should not be interleaved with the code because this makes it harder to find the code.

    There would be no need for this if Perl had multi-line comments. And please don't tell me about the Acme module. Or the infamous here-doc technique. This must change! This is so sad!

item 25 and Perl 5 DWIMary (Re: My coding guidelines)
by John M. Dlugosz (Monsignor) on Nov 26, 2002 at 23:29 UTC
    I read in Apocalypse 4,
    But I've come to the conclusion that I'd rather screw around (a little) with the "insignificant whitespace" rule than to require an extra unnatural delimiter. If we observe current practice, we note that 99% of the time, when people write a hash subscript they do so without any whitespace before it. And 99% of the time, when they write a block, they do put some whitespace in front of it. So we'll just dwim it using the whitespace.

    ... Therefore, we will make the rule that a left curly that has whitespace in front of it will never be interpreted as a subscript in Perl 6. (If you think this is totally bizarre thing to do, consider that this new approach is actually consistent with how Perl 5 already parses variables within interpolated strings.)

      I'm well aware of that. It's also the main reason I'm not interested in perl6 at all.


Re: My coding guidelines
by belg4mit (Prior) on Dec 01, 2002 at 06:14 UTC
    Re: 23
    I'm not sure if I've said this in a node before, but probably uttered in the CB... % is an EXCELLENT custom delimiter, namely for regexen. It looks like /, if you're using an acceptable font for coding, but avoids leaning toothpick syndrome.

    I'm not belgian but I play one on TV.

Re: My coding guidelines
by sauoq (Abbot) on Dec 01, 2002 at 05:05 UTC

    Excellent. ++Abigail once for the whole thing and once each for 30, 31, and 32.

    Of course, I don't follow all of those guidelines but most of the ones I don't are cosmetic.

    • I don't use whitespace between indices.
    • I cuddle my elses.
    • I don't use whitespace around the ->, ++ and -- operators.
    • I prefer either foo($bar) or foo( $bar ) to foo ($bar)
    • I use Getopt::Std and eschew long options. I do provide -h and -V for usage and version banners though.
    This is just a first draft. I've probably forgotten some rules.

    How about your guidelines for breaking long lines?

    "My two cents aren't worth a dime.";
Re: My coding guidelines
by hardburn (Abbot) on Nov 26, 2002 at 15:09 UTC

    POD SHOULD not be interleaved with the code, and is not an alternative for comments.

    Does this include POD put before subroutines? Such as:

    =item foo foo $arg1 Does foo =cut sub foo { ... } =item bar bar $foo Does bar =cut sub bar { ... }

    Variables with a small scope SHOULD have short names, variables with a broad scope SHOULD have descriptive names.

    Heh, there was a guy in my high school C++ class who declared all his counter variables as globals, and he used a different var for every loop. The entire bloody alphabet was at the top of every code file he wrote.

    Constants (or variables intended to be constant) . . .

    Is there any execuse for not doing "use constant ..."?

    No cuddled elses or elsifs . . .


    Indents SHOULD be 4 spaces wide. Indents MUST NOT contain tabs.

    Where's my flame thrower? :)

      Does this include POD put before subroutines?

      Yes it does. That was exactly what I was aiming it. Whenever I have to go through a file that alternates POD, subroutine, POD, subroutine, I find it very hard to find my way around the code.

      I also think that it's rare you want the documented function to appear in the same order in the manual page as that you happen to have them in the file.

      Is there any execuse for not doing "use constant ..."?

      Most certainly. It's easier to interpolate $CONSTANT than CONSTANT. And $CONSTANT will not fall victim to autoquoting (left of a fat arrow, or as a hash index), while CONSTANT might if you aren't careful.

      "Cuddled elses" is a term from perlstyle. It means:
      if (condition) { + code } else { # This else is cuddled. more code }
      This is K&R style. With uncuddled elses, you align the else with the if and the closing braces. There's a lot to say for K&R style though, it strengthens the fact that else doesn't start a new construct, but is part of the if, and it will save a line.


Re: My coding guidelines
by petral (Curate) on Nov 26, 2002 at 19:03 UTC
    Does anyone else do this?

          Prefer '<'  to  '>'.

          This makes numbers read from low to high, which somehow accords with the way we think:
              0 <= $val < 100   instead of:   100 > $val >= 0