Hash to count characters

amittleider has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Hash to count characters by jwkrahn (Abbot) on Aug 12, 2010 at 03:51 UTC
`open (DAT, "@ARGV");` [download] `"@ARGV"` is short for `join( $", @ARGV )`. If you just want the first argument from the command line then use `$ARGV[0]` instead. You should really be using the three argument form of open and you should always verify that the file opened correctly, so: `open mt $DAT, '<', $ARGV[ 0 ] or die "Cannot open '$ARGV[0]' $ +!";` [download] `while ($line = <DAT>){ do (@word = split (/\W/, $line)); foreach $word (keys %charCount){ do (@letter = split (/\w+/, $word); $letter = (keys %charcount)} if ($charCount){$char}){ $charCount{$char}++; }else { $charCount{$char}=1; }` [download] Since you say that you only want letters you need something like this: `my %charCount; while ( my $line = <$DAT> ) { my @letters = $line =~ /[a-zA-Z]/g; foreach my $char ( @letters ) { $charCount{ $char }++; } } foreach my $char ( keys %charCount ) { print "$char => $charCount{$char}\n"; }` [download] `close(DAT, "@ARGV");` [download] close only accepts one argument, the filehandle that was previously opened. `close $DAT;` [download]	[reply] [d/l] [select]
Re: Hash to count characters by nvivek (Vicar) on Aug 12, 2010 at 03:27 UTC
Your first attempt is correct but you need to change the @line_char to @line_words because you split the line and store all the characters into @line_words array only @line_char.One more suggestion whenever you do program, use the following in your code. `use strict; use warnings;` [download] Both the modules help you to correct the problems in your program.If you use any scalar, array or hash without declaration,it will warn you.	[reply] [d/l]
Re^2: Hash to count characters by amittleider (Initiate) on Aug 12, 2010 at 05:46 UTC
Thanks a lot for your responses! nvivek's post worked, however, there is just one slight bug. This will produce an output that includes spaces and newline characters, which are unwanted. I tried to change the regex to /\w+/, because this says that there will be only alphanumeric strings plus underscores, but this produces an empty output. I just don't understand why it would produce characters with a // regex, but nothing with /\w+/	[reply]
Re^3: Hash to count characters by roboticus (Chancellor) on Aug 12, 2010 at 11:54 UTC
amittleider: Regarding the unwanted items in your report: There are three general ways to approach it: Remove unwanted characters before counting, Delete them after counting but before reporting, or Delete or ignore them during the report. Each method has situations where it is better than the others, but frequently any of them are good enough. Examples: `# Case 1: don't count unwanted characters for my $char(@letters) { ++$charCount{$char} if $char !~ /[a-zA-Z]/; } # Case 2: delete unwanted characters my %t = %charCount; $t{$_}=$charCount{$_} for grep {/[a-zA-Z]/} keys %charCount; %charCount=%t; # Case 3: ignore unwanted items during report for my $char (sort keys %charCount) { next unless $char =~ /[a-zA-Z]/; # print report entry }` [download] ...roboticus	[reply] [d/l]
Re: Hash to count characters by dasgar (Priest) on Aug 12, 2010 at 05:48 UTC
Both nvivek and jwkrahn gave you good tips on correcting your code while staying with your algorithm. However, I had a different route to get the character counts in a file. Instead of breaking the data down into words and then breaking it down further into characters, I say break down the data into the characters from the start. I'll give you a hint at what I'm thinking about. Consider the following lines of code: `my $line = "This is sample data simulating a line from a file."; my (@chars) = ($line =~ m/([st])/gi);` [download] What you'll end up with is an array whose elements are `[T s s s t s t]`, which are the s's and t's from the variable `$line`. If you combine that with a hash, you should be able to accomplish what you want to do. Since you said that this was an assignment, this sounds like something you're doing for a class. That's why I'm just giving hints rather than saying "Here's the code to do your assignment.", which won't be much help for future assignments and tests. If you really, really want to see code, check out my scratchpad. Just keep in mind that you copy my stuff verbatim, your teacher/instructor will probably realize that it's not your code since it won't match your code style and might use stuff that might not have been covered yet.	[reply] [d/l] [select]
Re: Hash to count characters by JavaFan (Canon) on Aug 12, 2010 at 09:05 UTC
As a one-liner: `perl -0777E '$s{$_}++ for split//,<>; say "$_ ", $s{$_}\|\|0 for "a".."z +", "A".."Z"' your-data-file` [download] I would count all characters, and at the end only display the characters you are interested in.	[reply] [d/l]
Re: Hash to count characters by FunkyMonk (Chancellor) on Aug 12, 2010 at 09:22 UTC
`if ($charCount{$char}){ $charCount{$char}++; }else { $charCount{$char}=1; }` [download] Perl will happily increment an undefined variable. In other words, the block above does exactly the same as just `$charCount{$char}++;` [download]	[reply] [d/l] [select]
Re: Hash to count characters by roboticus (Chancellor) on Aug 12, 2010 at 12:26 UTC
amittleider: Just for grins, here's another way to do it: `#!/usr/bin/perl use strict; use warnings; my %charCount; my $corpus = join('', <DATA>); $corpus =~ tr/A-Z/a-z/d; # Map uppercase to lowercase $corpus =~ tr/a-z//cd; # Delete all but lowercase $charCount{$_}++ for split //, $corpus; for (sort keys %charCount) { print "$_ : $charCount{$_}\n"; } __DATA__ Now is the time for all good men to come to the aid of their party. The quick red fox jumped over the lazy brown dog. The warrior swings the +6 axe at the orcs standing in front of him.` [download] ...roboticus	[reply] [d/l]
Re^2: Hash to count characters by Anonymous Monk on Aug 12, 2010 at 20:11 UTC
Whoa! So many great ideas so fast. You monks really are lifesavers! Here's the final working code! (I'll be sure to use strict and warnings in the future!) print "Counting from @ARGV \n"; &countWords(); &countChar(); sub countWords() { open DAT, "< @ARGV[0]" or die "Can't open @ARGV : $!"; print "Word Count\n"; while($line = <DAT>){ my @line_words = split(/\W/, $line); foreach my $word (@line_words){ if ($wordCount{$word}){ $wordCount{$word}++; }else { $wordCount{$word}=1; } } } close(DAT); for $word (sort keys %wordCount) { print "$word => $wordCount{$word}\n"; } } sub countChar() { open DAT, "< @ARGV[0]" or die "Can't open @ARGV : $!"; print "Character count\n"; while ($line = <DAT>){ my @line_words = split (//, $line); foreach my $char (@line_words){ if ($charCount{$char}){ $charCount{$char}++; }else { $charCount{$char}=1; } } } for $char (sort keys %charCount) { next unless $char =~ /[a-zA-Z]/; print "$char => $charCount{$char}\n"; } close(DAT); } [download] <3<3 AJ	[reply] [d/l]


Problems? Is your data what you think it is?
	PerlMonks