http://qs321.pair.com?node_id=321027


in reply to Re: Perl Destroys Interview Question
in thread Perl Destroys Interview Question

Your solution also breaks down if there is punctuation in the file. (OS HP-UX 11.0)

File

This is a test file. How many unique words are in this file? Do you know? Does the file contain more than ten words?

Results

1 1 a 1 are 1 contain 1 do 1 does 1 file 1 file. 1 file? 1 how 1 in 1 is 1 know? 1 many 1 more 1 ten 1 test 1 than 1 the 2 this 1 unique 1 words 1 words? 1 you

Update: Changed the test file.

Replies are listed 'Best First'.
Re: Re: Re: Perl Destroys Interview Question
by redsquirrel (Hermit) on Jan 13, 2004 at 20:00 UTC
    Here are the requirements I was given...

    Program Purpose

    The goal of the program is to count the occurrences of all words in a file, and write this count into a new file.

    Requirements

    • The input file will contain 1 word per line (lines will be terminated by the newline character), and the file will contain an arbitrary number or lines.
    • The file will be terminated by an end of file character.
    • The word count must be case insensitive, as there may be varying case throughout the file.
    • The output file must write each word once, and include the number of occurrences of that word on the same line.
    • The lines in the output file must be sorted in ascending order.
    Sample Input:
    Chicago
    Paris
    chicago
    London
    red
    blue
    Green
    Red
    REd
    london
    
    Sample output:
    blue;1
    Chicago;2
    Green;1
    London;2
    Paris;1
    red;3
    
      So your original solution works for the narrow scope of the requirements. It fails if the requirement that there is one word per line is changed. This explains perfectly why the questions above arose about lines versus words -- according to the spec, they can be considered the same.

      Now only one question remains. Do you code to exactly match a questionable spec? Or, more to the point, wouldn't it be better to code something which works according to the exact spec plus gets the behavior right if the questionable part of the spec is changed?

      I think that when possible, a restrictively narrow spec should be answered with a more general solution which works for the spec at hand and future likely changes. In some instances, the future likely cases are hard to determine. In this one they are not. In the spirit of a job interview, I'd like to see either both ways implemented, or a comment in the code that one way was chosen over the other because of the nature of the spec.

      Of course, redsquirrel, since you already went above and beyond what the question asked it wouldn't be fair to complain that you didn't do even more work. I'm just making points about more general cases again. ;-)

      Come to think of it, it seems that much of my life as a programmer, and even much of my life besides programming (and probably because of habits learned from programming) is about making solutions which already work for one case more general. I think this is probably a goal of a large percentage of programming effort overall.

      Update: fixed a tpyo.



      Christopher E. Stith

        How does this play along with XP (Extreme Programming)? I'm not too familiar with XP myself nor do I have any experience in it, but to my understanding you shouldn't make the general solution if you want to be an extreme programmer. The reason for that would be that it's likely that the spec would be altered in such a way that even your general solution wouldn't solve the new problem. Then you've solved something that wasn't a problem for you to solve, and hence done unnecessary work. The planning for the future would be done by other techniques (such as having well refactored code).

        This is how i interpret the XP philosophy. I may very well be wrong. Please correct me if so.

        ihb

Re: Perl Destroys Interview Question
by Abigail-II (Bishop) on Jan 13, 2004 at 16:59 UTC
    That just depends on how a word is defined. Which the OP didn't. And considering the suggestions how to fix the OP's solution (split with no arguments/-a without a -F), I wasn't the only one taking the not uncommon "non-whitespace" definition.

    But I'd like to see the version you would write during a job interview. Make sure you take into account punctuation, Unicode and words like O'Reilly, and home-brew.

    Abigail