So, the Java 1.4 documents are beginning to come out... and they are incredibly
excited about the regular expression support and just how *easy* string processing
is getting in java. As an example, here is the program the document suggests for
creating a histogram of all of the words in a file:
import java.io.*;
import java.nio.*;
import java.nio.channels.*;
import java.nio.charset.*;
import java.util.*;
import java.util.regex.*;
public class WordCount {
public static void main(String args[]) throws
Exception {
String filename = args[0];
// Map File from filename to byte buffer
FileInputStream input = new
FileInputStream(filename);
FileChannel channel = input.getChannel();
int fileLength = (int)channel.size();
MappedByteBuffer buffer =
channel.map(FileChannel.MAP_RO, 0,
fileLength);
// Convert to character buffer
Charset charset = Charset.forName("ISO-8859-1");
CharsetDecoder decoder = charset.newDecoder();
CharBuffer charBuffer = decoder.decode(buffer);
// Create line pattern
Pattern linePattern = Pattern.compile(".*$",
Pattern.MULTILINE);
// Create word pattern
Pattern wordBreakPattern =
Pattern.compile("[{space}{punct}]");
// Match line pattern to buffer
Matcher lineMatcher =
linePattern.matcher(charBuffer);
Map map = new TreeMap();
Integer ONE = new Integer(1);
// For each line
while (lineMatcher.find()) {
// Get line
CharSequence line = lineMatcher.group();
// Get array of words on line
String words[] = wordBreakPattern.split(line);
// For each word
for (int i=0, n=words.length; i<n; i++) {
if (words[i].length() > 0) {
Integer frequency =
(Integer)map.get(words[i]);
if (frequency == null) {
frequency = ONE;
} else {
int value = frequency.intValue();
frequency = new Integer(value + 1);
}
map.put(words[i], frequency);
}
}
}
System.out.println(map);
}
}
Ok... I don't know about you, but if I were a maintenence coder, and I was presented
with this snippet, I don't think I'd know what to do! Cognitive psychology tells us that
the human mind can hold on average 7 units of information at once... *this* particular
program has *considerably* more than 7 logical atoms of information... thereby
making it larger than can be held in the mind at one moment. So, let's look at a
program that duplicates this functionality in say... perl. Now, I know that Perl isn't the
end all be all language, but:
#!/usr/bin/perl -w
use strict;
my %frequency = ();
$frequency{$_}++ for (split /\W/, <>);
print "$_: $frequency{$_}\n" for (keys %frequency);
This program now has variable declaration checking, handles multiple files at the
command line, etc... due to use strict, and -w there is a relatively strong guarantee that
I'm not making any of the "mistakes" that are common with "interpreted" VHLLs. (I know perl is not *really* interpreted, it's a hybrid, but people lump it in with the "interpreted" languages.)
Now, tell me... is that not a *lot* easier to comprehend... and more importantly, if you
were a maintenance coder... would you not prefer to have to understand these 2 lines of code, rather than the chunk of java? All language bigotry aside... and yes, Perl has some serious flaws... I'm beginning to see the beauty of VHLLs more and more and more every day. It's such a pleasure to be able to *express* my program, rather than dictate it.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.