comment on

So, the Java 1.4 documents are beginning to come out... and they are incredibly excited about the regular expression support and just how *easy* string processing is getting in java. As an example, here is the program the document suggests for creating a histogram of all of the words in a file:

 import java.io.*;
 import java.nio.*;
 import java.nio.channels.*;
 import java.nio.charset.*;
 import java.util.*;
 import java.util.regex.*;

 public class WordCount {
  public static void main(String args[]) throws 
  Exception { 
    String filename = args[0];

    // Map File from filename to byte buffer
    FileInputStream input = new 
    FileInputStream(filename);
    FileChannel channel = input.getChannel();
    int fileLength = (int)channel.size();
    MappedByteBuffer buffer = 
    channel.map(FileChannel.MAP_RO, 0, 
      fileLength);

    // Convert to character buffer
    Charset charset = Charset.forName("ISO-8859-1");
    CharsetDecoder decoder = charset.newDecoder();
    CharBuffer charBuffer = decoder.decode(buffer);

    // Create line pattern
    Pattern linePattern = Pattern.compile(".*$",
      Pattern.MULTILINE);

    // Create word pattern
    Pattern wordBreakPattern = 
    Pattern.compile("[{space}{punct}]");

    // Match line pattern to buffer
    Matcher lineMatcher = 
    linePattern.matcher(charBuffer);

    Map map = new TreeMap();
    Integer ONE = new Integer(1);

    // For each line
    while (lineMatcher.find()) {
      // Get line
      CharSequence line = lineMatcher.group();

      // Get array of words on line
      String words[] = wordBreakPattern.split(line);

      // For each word
      for (int i=0, n=words.length; i<n; i++) {
        if (words[i].length() > 0) {
          Integer frequency = 
          (Integer)map.get(words[i]);
          if (frequency == null) {
            frequency = ONE;
          } else {
            int value = frequency.intValue();
            frequency = new Integer(value + 1);
          }
          map.put(words[i], frequency);
        }
      }
    }
    System.out.println(map);
  }
 }
[download]

Ok... I don't know about you, but if I were a maintenence coder, and I was presented with this snippet, I don't think I'd know what to do! Cognitive psychology tells us that the human mind can hold on average 7 units of information at once... *this* particular program has *considerably* more than 7 logical atoms of information... thereby making it larger than can be held in the mind at one moment. So, let's look at a program that duplicates this functionality in say... perl. Now, I know that Perl isn't the end all be all language, but:

 #!/usr/bin/perl -w

 use strict;

 my %frequency = ();

 $frequency{$_}++ for (split /\W/, <>);
 print "$_: $frequency{$_}\n" for (keys %frequency);
[download]

This program now has variable declaration checking, handles multiple files at the command line, etc... due to use strict, and -w there is a relatively strong guarantee that I'm not making any of the "mistakes" that are common with "interpreted" VHLLs. (I know perl is not *really* interpreted, it's a hybrid, but people lump it in with the "interpreted" languages.) Now, tell me... is that not a *lot* easier to comprehend... and more importantly, if you were a maintenance coder... would you not prefer to have to understand these 2 lines of code, rather than the chunk of java? All language bigotry aside... and yes, Perl has some serious flaws... I'm beginning to see the beauty of VHLLs more and more and more every day. It's such a pleasure to be able to *express* my program, rather than dictate it.

In reply to Efficiency in maintenance coding... by eduardo

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Come for the quick hacks, stay for the epiphanies.
	PerlMonks