Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re (tilly) 2: Efficiency in maintenance coding...

by tilly (Archbishop)
on Nov 15, 2001 at 02:31 UTC ( [id://125446]=note: print w/replies, xml ) Need Help??


in reply to Re: Efficiency in maintenance coding...
in thread Efficiency in maintenance coding...

Were I trying to tout the advantages of Perl to a maintainance programmer who had been around the block, I would not put in comments that they would be sure to recognize as maintainance pitfalls! Instead I would go the other way:
#! /usr/bin/perl -w use strict; # Create a frequency count of all words in all input files my %freq_count; while (defined(my $line = <>)) { while ($line =~ /(\w+)/g) { $freq_count{$1}++; } } # Print the frequency summary. foreach my $word (sort keys %freq_count) { print "$word:\t$freq_count{$word}\n"; }
There. The looping constructs are all readily explainable, there are no hidden uses of $_, and no comments that will become wrong with time. I also removed a bug in the code that you wrote (which you copied unchanged from eduardo).

Kudos to the first person to figure out what the bug is.

Replies are listed 'Best First'.
Re(3): Efficiency in maintenance coding...
by FoxtrotUniform (Prior) on Nov 15, 2001 at 02:58 UTC

    tilly: Kudos to the first person to figure out what the bug is.

    I'm guessing split /\W/: this splits on each non-word character, but if there are several \Ws together(a comma followed by a space, for instance) it will split between them, creating a spurious "" word. The fix was to look for \w+ (although you might also say split /\W+/).

    Update: The above split-based "solution" introduces spurious "" words if a line (say) begins (or ends) with a \W. Looks like m/(\w+)/g is the Right Thing in this case.

    Update 2: Of course, split discards any empty trailing entries, so only the ones at the beginning of the line are a problem. (I'll get this eventually...)

    --
    :wq
      Pretty good...

      But your proposed fix only handles 95% of the problem. Why didn't I try fixing things that way? (ie What does your fix miss?)

      BTW so far this code example is not making the case for Perl being maintainable look very good... :-(

      UPDATE
      Your update is half-right. The half that is wrong is the most common misunderstanding I have encountered about how split behaves...

      UPDATE 2
      Eventually seems to have come. Modulo difficult questions about how the definition of a word ain't what you would expect. Consider a kudo delivered. :-)

        95%? You expect to only have 20 words? ;) [Update: I was thinking of a "slurp" version -- and rechecking shows that the original Perl version had a much more serious bug...]

        BTW, I guess "ain't" ain't a word. (:

                - tye (but my friends call me "Tye")
Re (tilly) 3: Efficiency in maintenance coding...
by jynx (Priest) on Nov 15, 2001 at 04:26 UTC

    One last hurdle,

    What if you want to print out the list not alphabetically, but by how many occurances of the word occur? The easiest way to do this would be an ST1. Is an ST easy to maintain?

    Correct me if i'm wrong but i believe that Java has a method of doing this immediately (which is probably why they used the Tree to print it) whereas Perl can do it readily, but it's harder to understand for the common Java programmer, not to mention a few Perl programmers. Who wins maintainability this time?

    jynx

    1Schwartzian Transform
    update - d'oh, i shouldn't post before my first cup of coffee. please disregard...

      Perl.

      Right now our Java solution hasn't yet figured out how to handle contractions yet. Perl is still ahead. Secondly why break out the sledgehammer when you don't have to?

      foreach my $word ( sort { $freq_count{$b} <=> $freq_count{$a} or $a cmp $b } keys %freq_count ) { print "$word:\t$freq{$word}\n"; }
      Most Perl programmers should understand this. Doing the same in Java, well probably somewhere in their maze of classes is one that naturally sorts in exactly the order you want. Good luck finding it, and good luck for the average Java programmer realizing why you chose this way of doing things.

      And as for the maintainability of a Schwartzian Transform, it is a trick. If you think in a list-oriented way, or if you have ever really understood the tranform, then maintaining it when you see one is pretty easy. I understand it, and I try to ensure that people I train are able to handle list-oriented thinking. So it isn't a problem for me. But YMMV on its maintainability.

      No Schwartzian Transform is required. Does Java let you easily sort ignoring case? How about by length?...

      sort { $freq{$a} <=> $freq{$b} } keys %freq sort { lc($a) cmp lc($b) || $a cmp $b } keys %freq sort { length($a) <=> length($b) || lc($a) cmp lc($b) || $a cmp $b } keys %freq
      You might find that a ST executes faster for that second case (and you could almost certainly speed it up with one of several techniques that are faster in Perl than an ST), but I doubt the speed gain would be worthwhile since lc() shouldn't be that slow.

      Speeding up the third case with a ST would be more difficult than using some other sort-speeding techniques (many of which have names that I don't recall). Though you'd have to have a whole lot of different words for the trade off of sort speed for code complexity to be a "win" here, especially since we are trying to write very maintainable code.

              - tye (but my friends call me "Tye")

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://125446]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (6)
As of 2024-04-19 12:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found