amittleider has asked for the wisdom of the Perl Monks concerning the following question:
Hello Monks,
I've been working diligently on an assignment to write a perl function that will count the number of each letter a-Z in a file given by a command line argument. I have to say, I have tried as I might with no success.
Here are two attempts, the first is commented (But I think it is very close!)
#sub countChar() {
# open (DAT, "@ARGV");
# print "Character count\n";
# while ($line = <DAT>){
# my @line_words = split (//, $line);
# foreach my $char (@line_char){
# if ($charCount{$char}){
# $charCount{$char}++;
# }else {
# $charCount{$char}=1;
# }
# }
# }
# foreach $char (keys %charCount) {
# print "$char => $charCount{$char}\n";
# }
# close(DAT, "@ARGV");
#}
#foreach $char (keys %charCount) {
# print "$char => $charCount{$char}\n";
#}
sub countChar() {
open (DAT, "@ARGV");
print "Character count\n";
while ($line = <DAT>){
do (@word = split (/\W/, $line));
foreach $word (keys %charCount){
do (@letter = split (/\w+/, $word);
$letter = (keys %charcount)}
if ($charCount){$char}){
$charCount{$char}++;
}else {
$charCount{$char}=1;
}
foreach $char (keys %charCount) {
print "$char => $charCount{$char}\n";
}
close(DAT, "@ARGV");
}
}
Thanks for any/all comments!
AJ
Re: Hash to count characters
by jwkrahn (Abbot) on Aug 12, 2010 at 03:51 UTC
|
open (DAT, "@ARGV");
"@ARGV" is short for join( $", @ARGV ). If you just want the first argument from the command line then use $ARGV[0] instead. You should really be using the three argument form of open and you should always verify that the file opened correctly, so:
open mt $DAT, '<', $ARGV[ 0 ] or die "Cannot open '$ARGV[0]' $
+!";
while ($line = <DAT>){
do (@word = split (/\W/, $line));
foreach $word (keys %charCount){
do (@letter = split (/\w+/, $word);
$letter = (keys %charcount)}
if ($charCount){$char}){
$charCount{$char}++;
}else {
$charCount{$char}=1;
}
Since you say that you only want letters you need something like this:
my %charCount;
while ( my $line = <$DAT> ) {
my @letters = $line =~ /[a-zA-Z]/g;
foreach my $char ( @letters ) {
$charCount{ $char }++;
}
}
foreach my $char ( keys %charCount ) {
print "$char => $charCount{$char}\n";
}
close(DAT, "@ARGV");
close only accepts one argument, the filehandle that was previously opened.
close $DAT;
| [reply] [d/l] [select] |
Re: Hash to count characters
by nvivek (Vicar) on Aug 12, 2010 at 03:27 UTC
|
Your first attempt is correct but you need to change the @line_char to @line_words because you split the line and store all the characters into @line_words array only @line_char.One more suggestion whenever you do program, use the following in your code.
use strict;
use warnings;
Both the modules help you to correct the problems in your program.If you use any scalar, array or hash without declaration,it will warn you. | [reply] [d/l] |
|
Thanks a lot for your responses! nvivek's post worked, however, there is just one slight bug. This will produce an output that includes spaces and newline characters, which are unwanted. I tried to change the regex to /\w+/, because this says that there will be only alphanumeric strings plus underscores, but this produces an empty output.
I just don't understand why it would produce characters with a // regex, but nothing with /\w+/
| [reply] |
|
amittleider:
Regarding the unwanted items in your report: There are three general ways to approach it:
- Remove unwanted characters before counting,
- Delete them after counting but before reporting, or
- Delete or ignore them during the report.
Each method has situations where it is better than the others, but frequently any of them are good enough. Examples:
# Case 1: don't count unwanted characters
for my $char(@letters) {
++$charCount{$char} if $char !~ /[a-zA-Z]/;
}
# Case 2: delete unwanted characters
my %t = %charCount;
$t{$_}=$charCount{$_} for grep {/[a-zA-Z]/} keys %charCount;
%charCount=%t;
# Case 3: ignore unwanted items during report
for my $char (sort keys %charCount) {
next unless $char =~ /[a-zA-Z]/;
# print report entry
}
...roboticus
| [reply] [d/l] |
Re: Hash to count characters
by dasgar (Priest) on Aug 12, 2010 at 05:48 UTC
|
Both nvivek and jwkrahn gave you good tips on correcting your code while staying with your algorithm. However, I had a different route to get the character counts in a file. Instead of breaking the data down into words and then breaking it down further into characters, I say break down the data into the characters from the start.
I'll give you a hint at what I'm thinking about. Consider the following lines of code:
my $line = "This is sample data simulating a line from a file.";
my (@chars) = ($line =~ m/([st])/gi);
What you'll end up with is an array whose elements are [T s s s t s t], which are the s's and t's from the variable $line. If you combine that with a hash, you should be able to accomplish what you want to do.
Since you said that this was an assignment, this sounds like something you're doing for a class. That's why I'm just giving hints rather than saying "Here's the code to do your assignment.", which won't be much help for future assignments and tests.
If you really, really want to see code, check out my scratchpad. Just keep in mind that you copy my stuff verbatim, your teacher/instructor will probably realize that it's not your code since it won't match your code style and might use stuff that might not have been covered yet. | [reply] [d/l] [select] |
Re: Hash to count characters
by JavaFan (Canon) on Aug 12, 2010 at 09:05 UTC
|
perl -0777E '$s{$_}++ for split//,<>; say "$_ ", $s{$_}||0 for "a".."z
+", "A".."Z"' your-data-file
I would count all characters, and at the end only display the characters you are interested in. | [reply] [d/l] |
Re: Hash to count characters
by FunkyMonk (Chancellor) on Aug 12, 2010 at 09:22 UTC
|
if ($charCount{$char}){
$charCount{$char}++;
}else {
$charCount{$char}=1;
}
Perl will happily increment an undefined variable. In other words, the block above does exactly the same as just
$charCount{$char}++;
| [reply] [d/l] [select] |
Re: Hash to count characters
by roboticus (Chancellor) on Aug 12, 2010 at 12:26 UTC
|
amittleider:
Just for grins, here's another way to do it:
#!/usr/bin/perl
use strict;
use warnings;
my %charCount;
my $corpus = join('', <DATA>);
$corpus =~ tr/A-Z/a-z/d; # Map uppercase to lowercase
$corpus =~ tr/a-z//cd; # Delete all but lowercase
$charCount{$_}++ for split //, $corpus;
for (sort keys %charCount) {
print "$_ : $charCount{$_}\n";
}
__DATA__
Now is the time for all good men to come to the aid of their party.
The quick red fox jumped over the lazy brown dog.
The warrior swings the +6 axe at the orcs standing in front of him.
...roboticus
| [reply] [d/l] |
|
Whoa! So many great ideas so fast. You monks really are lifesavers!
Here's the final working code! (I'll be sure to use strict and warnings in the future!)
print "Counting from @ARGV \n";
&countWords();
&countChar();
sub countWords() {
open DAT, "< @ARGV[0]" or die "Can't open @ARGV : $!";
print "Word Count\n";
while($line = <DAT>){
my @line_words = split(/\W/, $line);
foreach my $word (@line_words){
if ($wordCount{$word}){
$wordCount{$word}++;
}else {
$wordCount{$word}=1;
}
}
}
close(DAT);
for $word (sort keys %wordCount) {
print "$word => $wordCount{$word}\n";
}
}
sub countChar() {
open DAT, "< @ARGV[0]" or die "Can't open @ARGV : $!";
print "Character count\n";
while ($line = <DAT>){
my @line_words = split (//, $line);
foreach my $char (@line_words){
if ($charCount{$char}){
$charCount{$char}++;
}else {
$charCount{$char}=1;
}
}
}
for $char (sort keys %charCount) {
next unless $char =~ /[a-zA-Z]/;
print "$char => $charCount{$char}\n";
}
close(DAT);
}
<3<3
AJ | [reply] [d/l] |
|
|