Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
So, the Java 1.4 documents are beginning to come out... and they are incredibly excited about the regular expression support and just how *easy* string processing is getting in java. As an example, here is the program the document suggests for creating a histogram of all of the words in a file:

import java.io.*; import java.nio.*; import java.nio.channels.*; import java.nio.charset.*; import java.util.*; import java.util.regex.*; public class WordCount { public static void main(String args[]) throws Exception { String filename = args[0]; // Map File from filename to byte buffer FileInputStream input = new FileInputStream(filename); FileChannel channel = input.getChannel(); int fileLength = (int)channel.size(); MappedByteBuffer buffer = channel.map(FileChannel.MAP_RO, 0, fileLength); // Convert to character buffer Charset charset = Charset.forName("ISO-8859-1"); CharsetDecoder decoder = charset.newDecoder(); CharBuffer charBuffer = decoder.decode(buffer); // Create line pattern Pattern linePattern = Pattern.compile(".*$", Pattern.MULTILINE); // Create word pattern Pattern wordBreakPattern = Pattern.compile("[{space}{punct}]"); // Match line pattern to buffer Matcher lineMatcher = linePattern.matcher(charBuffer); Map map = new TreeMap(); Integer ONE = new Integer(1); // For each line while (lineMatcher.find()) { // Get line CharSequence line = lineMatcher.group(); // Get array of words on line String words[] = wordBreakPattern.split(line); // For each word for (int i=0, n=words.length; i<n; i++) { if (words[i].length() > 0) { Integer frequency = (Integer)map.get(words[i]); if (frequency == null) { frequency = ONE; } else { int value = frequency.intValue(); frequency = new Integer(value + 1); } map.put(words[i], frequency); } } } System.out.println(map); } }

Ok... I don't know about you, but if I were a maintenence coder, and I was presented with this snippet, I don't think I'd know what to do! Cognitive psychology tells us that the human mind can hold on average 7 units of information at once... *this* particular program has *considerably* more than 7 logical atoms of information... thereby making it larger than can be held in the mind at one moment. So, let's look at a program that duplicates this functionality in say... perl. Now, I know that Perl isn't the end all be all language, but:

#!/usr/bin/perl -w use strict; my %frequency = (); $frequency{$_}++ for (split /\W/, <>); print "$_: $frequency{$_}\n" for (keys %frequency);

This program now has variable declaration checking, handles multiple files at the command line, etc... due to use strict, and -w there is a relatively strong guarantee that I'm not making any of the "mistakes" that are common with "interpreted" VHLLs. (I know perl is not *really* interpreted, it's a hybrid, but people lump it in with the "interpreted" languages.) Now, tell me... is that not a *lot* easier to comprehend... and more importantly, if you were a maintenance coder... would you not prefer to have to understand these 2 lines of code, rather than the chunk of java? All language bigotry aside... and yes, Perl has some serious flaws... I'm beginning to see the beauty of VHLLs more and more and more every day. It's such a pleasure to be able to *express* my program, rather than dictate it.

In reply to Efficiency in maintenance coding... by eduardo

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (7)
As of 2024-04-23 19:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found