http://qs321.pair.com?node_id=66258


in reply to Re: Re (tilly) 4 (disagree): Another commenting question,
in thread Another commenting question,

I unfortunately don't have the energy to give this thread the full attention it clearly deserves. However I have to say that there is a fundamental point I made. You complain that it would be great if good programmers did everything they do and also put in everything that you would like to see. You looked at that code and then you said I was a good programmer. I am trying to describe what has made me a better programmer - and the kind of things that can make for more good programmers...

Unfortunately programmers are people. And as people there are tradeoffs they have to make. I believe that the tradeoffs I am making with my style are worthwhile. The purpose of those guidelines is to make it possible for me to code like I do, and possible to keep the code clean after I am done with it and while I am maintaining it. Yes, it takes work for others to figure out what the code does. However it is my honest belief that with those stylistic guidelines I can get more done, and it will be less work for someone to pick up than if I coded in a different style.

Besides which, given the choice I would prefer to work at a higher level and bring co-workers up to that level rather than bringing myself down to the lowest common denominator. My decisions, while they may seem hard to learn when you see them on the net, are actually designed based in large part upon what I have and have not found that I could teach.

Page by page you might find a different style easier. But I am aiming to get more done in less pages and wind up with better organized code which is more easily tested, more easily proven correct, and more easily modified. I prove that it is easily modified by constantly modifying my own code. If I have trouble modifying and cleaning it up then there is a problem. I don't write gems. I write production code that I have to live with.

When you look at that code bear in mind that when I started I had a general idea of what I wanted to do, and a general idea what I wanted the API to look like. But there were going to be details to be filled in. In fact I added and removed functions, added, named, renamed, and destroyed variables and parameters. In short that is code that has been reconceptualized and rearranged before you ever saw it.

While I am writing it I am constantly refactoring it. Yes, it would be nice to have a beautiful collection of documentation on how it works, how to think about it, and what you want to look at to make changes. But if I tried to provide that commentary then I would be stuck with my first bad approach to everything because it is far too much work to change the documentation on how it is supposed to go. And if it is that much work for me to nail down my assumptions now, what will change when someone with a need makes the API a wrapper around a better module that can handle a series of related formats? I have no clue, but I know that change is not an unreasonable one to make, and I would far prefer that more formats were supported in that way than by cutting and pasting the code to 50 places.

What, you think that code is perfect and will only be built on? Uh, uh. If I have a need instead of duplicating that code elsewhere I am going to figure out how to generalize the idea, scoop out the body into something more general, and then leave a shell. And in that process the detailed comments and variables are going to go..where? I can't predict. I know I can't predict. I couldn't predict what would change while I was writing and I sure as heck can't predict my mind in 6 months.

So I am not bogging it down with clutter that will limit my options. If in context a variable is unclear then I will give it a better name. But if its scope is small and it is clear within the scope, then I won't. Let's take the 13 variables that I had that you didn't like:

  1. $lookup Defined on line 24, scope lasts to line 39. (I am only counting lines of code in scope, the closing brace is actually line 40. Big whoop.) Only used to line 32. You are right that it is poorly named. It is a reference to a hash and should be named $field_pos after the property of the object it is pulled from. But in fact I wrote extract before I had actually written bind_fields and when I settled on a name for the fields I already was using it and didn't rename it.
  2. @data Defined on line 26. Used to end of scope on line 39. This appears in a function called extract() that is documented to extract data out of the row for specified fields. This array holds that data. Longer names could be chosen but I don't find this one to be inappropriate to its scope.
  3. @allowed Defined on line 32. Scope closes on line 35. Considering that the second and final time it appears it is part of this string: "Valid fields are: (@allowed)\n" I find its meaning abundantly clear and highly doubt any good Perl programmer who understood English would have trouble...
  4. $q_sep Defined on line 61. Used to line 63. Would you prefer to see quotemetaed_sep as a name instead?
  5. $match_sep Defined on line 62, used on line 74. Through the whole program there is a field called sep, that appears in the documentation which is the one character separator between fields. That is a compiled RE that will, wonder of wonders actually match whatever sep currently is! The second place it appears may be read, "Try to match sep, and if you fail then...". If you have dealt with much of my code that will not appear to be a strange name.
  6. $expected Defined on line 79. Used on line 80. Those 2 lines generate and spit out a warning message. That variables holds (duh) what I expected to see. Given the context that should not be a problem to understand.

    Let me say more about this.

    The message would be a fatal error and would read something like: Expected ',' at foo.csv, line 25, char 23 which would be followed by a complete stack backtrace. If you opened foo.csv, went to line 25, and went to char 23 you would find, wonder of wonders, that a quoted fields had just ended on character 22, but you didn't have the expected divider right after that.

    Plus if you practice reading that style you will see that it is a pretty good comment. If on line 74 you fail to match $match_sep, and on line 75 you find you were not at the end of the line, then you failed to match $sep. Well then guess what $match_sep is supposed to match??

  7. $piece On line 88, in a function that is supposed to match a quoted field, I define $piece and then append stuff to it and return it. Would you guess that, just perhaps, $piece is the piece of the quoted field that we have discovered so far? (It is only a piece because we could go over a line break and need to fetch a new line, which is what we are promising that we will do that is not done by Text::CSV.)
  8. $req The comment says to see node 43323. That would be Handling named function arguments. $req is an anonymous array of required fields. See also the comment on line 110.
  9. $opt An anonymous array of optional fields. See the same places as for $req.
  10. $default An anonymous hash of defaults for fields. See the same places as for $req.
  11. @res The resulting values, in order.
  12. $sep The one character separator. As documented on line 235.
Now you claim these give no indication of what they do or why they were declared? I think we will need external opinions here because I emphatically disagree. If I am building an error message and I put what I expected to see in a variable named $expected, well reading that a day later, a week later, a month later, or a year later I would expect to see that variable contain information on what I expected to see! Likewise if I document a parameter to a function that I call "sep", and I have a function called "set_sep" which is also documented, and I have a field in the object which is likewise called "sep", then probably pretty much anything called "sep" is going to refer to whatever that documentation says.

If that isn't reasonable, what is?

Likewise I claim that my function names are supposed to indicate what things do, and leading underscores often indicate that they cannot be usefully called externally. You claim to have not heard of the underscore rule and ask if it is widely used. FYI it is widely used in the Perl community. For the first instance I turned up in the Perl documentation see perltoot and look for "underscore". The first module I found it in was LWP::UserAgent, which calls _need_proxy and _elem, defines the first itself and the other is from a parent module called LWP::MemberMixin.

And about get_row vs _get_row, you cynically ask if they do the same thing. Well what would you guess that they should do given the names? How about get a row? Astoundingly that is exactly what both of them do!

So why the difference? Why is one public and one private?

Well _get_row does not manage the objects state as desired, and depends on a variety of initializations having happened. Therefore _get_row will get a row but you can't call it directly. By contrast get_row gets a row but does not depend on actions having taken place.

If would be natural to guess about now that get_row is probably just implemented as a wrapper around _get_row that sets things up so that _get_row can be called and will work right. Amazingly enough this is how it actually works. Well you may wonder, why have a separate function? Well there are two reasons.

The first is that you do have this reasonably clear separation between the necessary initializations and the work of parsing a row from the data. When I spot such separations I like to create new functions.

The second is that, based on past experience in writing little parse engines, I know full well that it is a common and frustrating error to have accidental exit points in the parse loop that you didn't anticipate. Therefore I test that. There are two ways to test it. One is to introduce a flag. The other is to move the logic into a function from which you exit in the middle, and falling out of the loop is a fatal error. I find the second easier to understand...

This is stuff I have learned programming. There is more of it there - a lot more. Can I really be expected to document all of that? How much do you want me to say? I can probably fill a book, but no matter what I say I won't have covered all of the likely questions that a maintainance programmer would be likely to come up with. I would succeed in making it hard for me to fix and modify the ossified corpse of the code...

Now if you want to see a longer code example, take a look at Math::Fleximal. I wrote that today. It is conceptually more complex. Now the interesting thing about it is that I have never before tried to write anything like it. I just started knowing how to do arithmetic and tried to make it all work. I won't lie, my initial conception is not what I ended up with. The API is kind of rough, but not as rough as what I started coding to. The first time I went to write times I still thought that I was going to intertwine base conversions with multiplication. That didn't work, and my attempt wound up as part of set_value. Likewise I vaguely thought that dup and new were going to be the same thing. Nope. Not even. And I never dreamed how many times the phrase (shift)->dup() would appear. That wasn't planned, it just happened.

Now that is a rough draft. If you want to complain, go ahead. There are a lot of things that need fixing. I know that. I can probably produce a longer list of fixes needed than you can...