Beefy Boxes and Bandwidth Generously Provided by pair Networks
Welcome to the Monastery

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
I think that principle components analysis is the wrong way to think about this problem.

First of all the analogy does not really carry. Principle components analysis depends on having some metric for how "similar" two vectors are which corresponds to the geometric "dot product". While many real world situations fit, and in many more you can fairly harmlessly just make one up, I don't think that code manages to fit this description very well.

But secondly, even if the analogy did carry, the basic problem is different. Principle component analysis is about taking a complex multi-dimensional data structure and summarizing most of the information with a small number of numbers. The remaining information is usually considered to be "noise" or otherwise irrelevant. But a program has to remain a full description.

Instead I think a good place to start thinking about this is Larry Wall's comment about Huffman coding in Apocalypse 3. That is an extremely important comment. As I indicated in Re (tilly) 3: Looking backwards to GO forwards, there is a connection between understanding well, and having mental models which are concise. And source-code is just a perfectly detailed mental model of how the program works, laid down in text.

As observing the results of Perl golf will show you, shortness is not the only consideration for well laid-out programs. However it is an important one.

So if laying out a program for conciseness matters, what does that tell us? Well basic information theory says a lot. In information theory, information is stated in terms of what could be said. The information in a signal is measured by how much it specified the overall message, that is how much it cut down the problem space of what you could be saying. This is a definition that depends more on what you could be saying more than what you are saying. Anyways from information theory, at perfect compression, every bit will carry just as much information about the overall message as any other bit. From a human point of view, some of those bits carry more important information. (The average color of a picture winter scene has more visual impact than the placement of an edge of a snowflake.) But the amount of information is evenly distributed.

And so it is with programming. Well-written source-code is a textual representation of a model that is good for thinking about the problem. It will therefore be fairly efficient in its representation (although the text will be inefficient in ways that reduce the amount of information a human needs to understand the code). Being efficient, functions will convey a similar amount of surprise, and the total surprise per block is likely to be fairly large.

In short, there will be a good compression in the following sense. A fixed human effort spend by a good programmer in trying to master the code, should be result in a relatively large portion of the system being understood. This is far from a compact textual representation. For instance the human eye finds overall shapes easy to follow, therefore it is good to have huge amounts of text be spent in allowing instant pattern recognition of the overall code structure. (What portion of your source-code is taken up with spaces whose purpose is to keep a consistent indentation/brace style?)

Of course, though, some of that code will be high-order design, and some will be minor details. In terms of how much information is passed, they may be similar. But the importance differs...

In reply to Re (tilly) 3: Maintainable code is the best code -- principal components by tilly
in thread Maintainable code is the best code by dragonchild

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others browsing the Monastery: (1)
    As of 2020-10-25 23:04 GMT
    Find Nodes?
      Voting Booth?
      My favourite web site is:

      Results (249 votes). Check out past polls.