Re: Perl Destroys Interview Question
by Abigail-II (Bishop) on Jan 13, 2004 at 02:26 UTC
|
Of course, your Perl solution (which is incorrect as it counts lines, not words) take more than 5 times the lines a shell
solution would take:
cat words.dat | tr 'A-Z ' 'a-z\012' | sort | uniq -c
I'd like to point out that for some problems, other solutions
are more suited than Perl.
Abigail | [reply] [d/l] |
|
Amen to that! The more languages I learn, the more I can see the strengths and weaknesses of each language.
| [reply] |
|
Your solution also breaks down if there is punctuation in the file. (OS HP-UX 11.0)
File
This is a test file. How many unique words are in this
file? Do you know? Does the file contain more than
ten words?
Results
1
1 a
1 are
1 contain
1 do
1 does
1 file
1 file.
1 file?
1 how
1 in
1 is
1 know?
1 many
1 more
1 ten
1 test
1 than
1 the
2 this
1 unique
1 words
1 words?
1 you
Update: Changed the test file. | [reply] [d/l] [select] |
|
Here are the requirements I was given...
Program Purpose
The goal of the program is to count the occurrences of all words in a file, and write this count into a new file.
Requirements
- The input file will contain 1 word per line (lines will be terminated by the newline character), and the file will contain an arbitrary number or lines.
- The file will be terminated by an end of file character.
- The word count must be case insensitive, as there may be varying case throughout the file.
- The output file must write each word once, and include the number of occurrences of that word on the same line.
- The lines in the output file must be sorted in ascending order.
Sample Input:
Chicago
Paris
chicago
London
red
blue
Green
Red
REd
london
Sample output:
blue;1
Chicago;2
Green;1
London;2
Paris;1
red;3
| [reply] |
|
|
|
|
| [reply] |
Re: Perl Destroys Interview Question
by Zaxo (Archbishop) on Jan 12, 2004 at 23:23 UTC
|
What mr_mischief says, which can be fixed by replacement with $words{$_}++ for split; (no chomp needed). Also, I'd prefer an output loop which didn't construct a potentially long list of keys to iterate. Something like this,
while ($_ = each %words) {
print $_, ';', $words{$_}, $/;
}
I like to name my hashes singular for their values, not their keys. That makes the doc-suggested pronounciation work - $count('foo'} is "count of foo" and so on.
| [reply] [d/l] [select] |
|
I agree, I like the name %count better than %words. The hash (Map) in my Java solution was named wordCount.
| [reply] |
Re: Perl Destroys Interview Question
by mr_mischief (Monsignor) on Jan 12, 2004 at 22:56 UTC
|
This doesn't count words in a file. It almost counts unique lines in a file. What it actually does is list each unique line in a file and the number of times it occurs. This is useful in some situations, and I'm sure it's quicker to do in Perl than in Java. It's hardly a case-insensitive word count.
Is this exactly the code you submitted to solve their problem, or did you retype this from memory?
| [reply] |
|
I copy/pasted this code. I didn't re-type it. Why do you ask?
| [reply] |
Re: Perl Destroys Interview Question
by Anonymous Monk on Jan 12, 2004 at 23:00 UTC
|
While you were at it you should have used strict, or else a one liner would have served the purpose just the same.
perl -lane '$w{lc $_}++ for @F;END{print for sort keys %w}' text.txt
| [reply] [d/l] |
|
| [reply] |
Re: Perl Destroys Interview Question
by LAI (Hermit) on Jan 12, 2004 at 22:53 UTC
|
Well done, redsquirrel. This seems to point out what most Java-Perl holy wars miss: that for certain applications Perl is far more useful than Java. (and, by extension, vice-versa.)
| [reply] [d/l] |