note
danger
<p>
I'd been thinking of doing a little blurb about Literate Programming
and a thread on commenting seemed an appropriate place to bring it
up.</p>
<p><b>Literate Perl ... who'da thunk it?</b>
</p>
<p>
It's easy to be Perl literate---just invest a lot of time
and hard work, read the camel start to finish (including the
index), code 'till you bleed, and get some corrective lenses.
Before you know it you'll be writing heavily punctuated, twisted
loops and chains that solve the world's problems in 13 lines of
code...and you'll know deep inside exactly why your particular
solution is the best possible solution and you'll be justifiably
proud of your accomplishments.
</p>
<p>
The next thing you'll find yourself experiencing is STCMDE. STCMDE
(pronounced: ess tee see em dee eee)
stands for Standard Time Cummulative Memory Dilution Effect and
occurs with alarming frequency among programmers who actually
find themselves revisiting code they wrote Some-Time-Ago (STA).
STA is a relative value representing the amount of time-passed
required to accumulate enough memory dilution to find yourself
saying 'What the heck was I thinking when I wrote that!?". For
many STA may be measured in weeks, months, or for a very few,
even years. For myself, STA can often be measured in days or even
hours.
</p>
<p>
The first thing you might notice when reviewing code you wrote
STA is that your comments (if you used any) are somewhat
inadequate---you realize that you wrote these comments while
holding the overall design in your head in perfect clarity (you were
writing the code at the time after all), but now that your mental
model of the design has slipped into the abyss of STCMDE, you realize
that your little comments are about as helpful as a string around
your finger, or that unrecognizable name and phone number you found
scribbled on the back of a matchbook cover from Fred's Bar and Dance
Emporium where you vaguely recall spending some of last Saturday
night.
</p>
<p>
Of course, being Perl Literate, you can almost certainly decipher
and re-discover the brilliant algorithmic design you incoporated
into your $world->solve_problems() method. However, this takes
time and energy.
</p>
<p>
One simple solution to this problem is more verbose commenting. But
verbose commenting can sometimes clutter up the code affecting
readability, and often, even verbose comments for a particular
subroutine don't express design ideas --- when the solution is clear
in our mind we tend to think of the implementation and design as
being obviously derivable from the code itself and that we'll only
need reminders about the routine's interface and not its
implementation.
</p>
<p>
Another problem with verbose commenting is that it is limited to
text---expressing equations or diagrams is not always an easy
ASCII option, but such extras can often be helpful in documenting
algorithms, datastructures, business rules, and data flow.
</p>
<p>
Of course, there can and does exist stellar code which contains
virtually no comments, but is readable---and grok-able---because
attention was paid to code design, layout, structure, variable
naming conventions and other details, and perhaps because the
complexity was not overly taxing. But even with the best exemplars of
such code, wouldn't you also like to read the author's own design
notes, diagrams, and other such paraphenalia (even if, or especially
if that author is you?). Just because algorithm X is perfectly clear
and understandable doesn't tell you why X was chosen over the
seemingly better algorithm Y (does the author know about Y at all, or
is there some peculiar quirk that prevents its use in this
instance?).
</p>
<p>
Literate Programming (LP) is one possible answer. But what, then, is
Literate Programming? Well, Donald Knuth (who started it all) had
this enlightening bit to say: </p>
<blockquote>
Let us change our traditional attitude to the construction of
programs: Instead of imagining that our main task is to instruct a
<em>computer</em> what to do, let us concentrate rather on explaining
to <em>humans</em> what we want the computer to do. (Donald E. Knuth,
1984).
</blockquote>
<p>
And that sums up the point of LP in a nutshell -- we reverse the
paradigm of embedding documentation within the source code, and
instead embed the source code within the documentation.
</p>
<p>
Essentially, the practice of LP is to present your code in named
'chunks'. These chunks of code not only define the hierarchical
structure of the program, but also contain the actual source code of
the program. Think of it as named macros, or a templating system for
presenting source code. For example, I could start describing the
overall structure of a program and then define the "root" chunk (the
chunk that defines the top-level structure of the program):
</p>
<code>
Blah blah blah about the overall design, and the basic structure
of the program is as follows:
<<foo.pl>>=
#!/usr/bin/perl -w
use strict;
<<load modules>>
<<initialize data structures>>
<<accept user input>>
<<perform analysis>>
<<error checking>>
<<foo.pod>>
@
</code>
<p>
Now, in the above, the <code><<foo.pl>>=</code> token indicates
the start of a chunk definition (the chunk is named foo.pl). Within
that definition is a bit of code (shebang line, and strict) and
then references to other named chunks (note, no trailing = sign means
those are merely references to chunks, not chunk definitions, in other
words, they are "includes" from elsewhere in the document).
</p>
<p>
Now I would begin describing each of those included chunks, in
whatever order makes the most sense to present them in, and defining
each chunk in the same way. Furthermore, any of those included chunks
could themselves also include yet further chunks at a slightly lower
level of detail. You can follow this approach from the top down,
bottom up (but that's generally less easy to follow), or some
combination of top-bottom-sideways presentation. So, the chunking
scheme gives you stepwise refinement from pseudo-code (the chunk
names) down to real code.
</p>
<p>
Another benefit is that this model isn't restricted to one program
per file -- you could have two root chunks (each representing a
separate extractable program) and present the programs in parallel.
Why might we do this? We might develop our test program right along
with our main program, and define chunks of test code in sequence
with the main code being tested. Similarly, we could also include a
complex test data file, and annotate it in chunks in sequence with
the code chunks that parse it (chunk of header data, chunk of header
parser, chunk of xyz data, chunk of xyz parser, etc.). In both of
these situations, the tests and/or the test data lie near the
relevant section of code in the same source file (surrounded by
whatever documentation you deem necessary), and are easy to keep in sync
(add to a given test/add to a given code chunk, add a new test/add a
new code chunk).
</p>
<p>
When we've written our source, we "tangle" out the root chunks (a
root chunk is simply any chunk that isn't referenced (or used by)
another chunk). The tangling process assembles all the code chunks
for a given root chunk and writes out the file (be it the main
program, the test program, or a test data file). Weaving is the
process of turning the original literate source into a typeset
document. Generally you write the literate source in a typesetting
markup language (such as LaTeX or HTML) allowing formatting control,
tables, included diagrams, or whatever. When you weave it, any chunk
indexing, identifier indexing, and cross-referencing is added
(depending on options and the particular LP tool you are using) and
then you either view the resulting source in a browser (HTML), or run
latex on the resulting file (to create xdvi, postsript, or PDF
output). Of course, you can also just use plain text for the
documentation and not bother with weaving at all (but then you loose
any cross-referencing and such that a woven document would give you).
So, the basic processes are:
</p>
<code>
--- tangle ----> foo.pl
|
foo.nw
|
--- weave -----> foo.tex ----> latex foo ----> foo.dvi
| |
| -> pdflatex foo -> foo.pdf
--------> foo.html
</code>
<p>
What about debugging you ask? It would be a problem to track down
problems in the tangled source and then have to locate where that
code is situated in the literate source file. Fortunately, good LP
tools help with this. For example, the
<a href="http://www.eecs.harvard.edu/~nr/noweb/">noweb</a> tool gives
the -L option which inserts <code>#line</code> directives into the
tangled source (Perl has line directives just like C) so that error
messages will point to the relevant line in the literate source
rather than the tangled source.
</p>
<p>
Literate Programming (LP) may not be the answer to your dreams of
readable source code. However, it does offer improvements
over verbose commenting. And while it certainly will not take the
place of well thought out design (why bother documenting poor design
anyway), it does at least encourage inclusion of the design process
with the source code. Another area where LP can be a big win is in
teaching -- when the goal is to teach and explain via examples, often
an LP approach can be better than simple commenting or line-by-line
post analysis.
</p>
<p>
I certainly don't use LP all the time (perhaps not even as often as I
should) -- many scripts simply do not require such an approach, but
is is worth looking into when developing larger projects. A few links
to find out more about literate programming:
</p>
<ul>
<li><a href="http://www.eecs.harvard.edu/~nr/noweb/">
The noweb homepage.</a> Noweb is a language independent LP tool.
You can find links to projects and examples from there.
<li><a href="http://shelob.ce.ttu.edu/daves/lpfaq/faq.html">The LP FAQ</a>
<li> Mark-Jason Dominus has
<a href="http://www.perl.com/pub/a/tchrist/litprog.html">
an article</a> briefly highlighting LP as contrasted with POD.
<li><a href="http://winnipeg.pm.org/newsletter/">PUBcrawl.</a>
Once upon a time I thought a literate perl newsletter would be cool. I
twisted a few arms in my local PM group to try their hand at writing
literate Perl articles -- we created one test issue and PUBcrawl has
lain comatose ever since (contact me if you are interested in helping
revive it).
</ul>
<p><em>Hey, my 100th post!</em></p>
64709
64709