Rewrite Program Using Arrays

perl.j has asked for the wisdom of the Perl Monks concerning the following question:

So I have this text file:

Bob Joe Steve
Joe Bob Mark
Joe Steve Fred
Fred Fred Joe
[download]

And I am reading and counting the names with this program:

use 5.12.4;
use strict;
use warnings;

my %wordcount;

open(FILE, "names.txt") or die "Could not open file: $!"; 

while(my $line = <FILE>) {
    chomp($line);
    my @words = split(' ', $line);
    foreach my $word(@words) {
        $wordcount{$word} += 1;
    }
}

sub hashDescend {
   $wordcount{$b} <=> $wordcount{$a};
}

foreach my $key (sort hashDescend (keys(%wordcount)) {
    print "Word: $key \t\t\t\t\t\t Count: " . ($wordcount{$key}) . "\n
+";
}
[download]

But now I want to change it up. I am going to be running this program in a huge repeating loop, so speed counts. So I want to make this faster.

I was thinking about making one big @globalarray and a @temparray and pushing from the @temparray into the global one. I am thinking this will make it a very tiny bit faster, but anything will do at this point. So how can I change my program as described, but still have the same output?

--perl.j

Comment on Rewrite Program Using Arrays Select or Download Code

Replies are listed 'Best First'.
Re: Rewrite Program Using Arrays by JavaFan (Canon) on Mar 25, 2012 at 16:39 UTC
So I want to make this faster. Then don't sort. Really. The rest of your program is essentially linear. Sorting is N log N. That's the bottleneck of your program. Any speedup you'll do in the rest of the program are pointless if your data size increases: the sort will dominate.	[reply]
Re^2: Rewrite Program Using Arrays by mbethke (Hermit) on Mar 25, 2012 at 17:51 UTC
If he needs his output sorted though? Certainly the built-in sort won't be slower than using some external sort later? Then again, the n in sort's O(n) is the number of word forms which doesn't increase linearly with corpus size. Some law for it can probably be derived from Zipf's Law, I suspect it's something like log(corpus_size). So the sort should take an average of log(n)*log(log(n)) over corpus size which is not that bad.	[reply]
Re^2: Rewrite Program Using Arrays by perl.j (Pilgrim) on Mar 25, 2012 at 20:38 UTC
Thanks for the advice, but I just want to clear this up. I am not comfortable using the push and pop functions, so this is also going to be a learning experience. I understand that there are a lot of things I can do to speed the process, but I want to see what this will do still... The problem is, I don't know how to do it. --perl.j	[reply]
Re^3: Rewrite Program Using Arrays by aaron_baugher (Curate) on Mar 26, 2012 at 05:21 UTC
There is no use for push/pop here, though. This is a pretty straightforward problem: counting the occurrences of words in a file. That means putting the words in a hash as the keys and incrementing the value when a word is seen. That's what you've done, so you're already using the best algorithm. You can gain some speed by cutting out any unnecessary assignments, as I showed with the benchmarks in my other post; but you certainly won't gain anything by using a less straightforward algorithm or by shuffling your words around through extra arrays. There really aren't "a lot of things you can do to speed the process." Building a hash and sorting on the values is as fast as it's gonna get, unless you want to rewrite it in C. Aaron B. My Woefully Neglected Blog, where I occasionally mention Perl.	[reply]
Re: Rewrite Program Using Arrays by mbethke (Hermit) on Mar 25, 2012 at 16:28 UTC
Pushing data around from one array to the other is very unlikely to make things faster, probably the opposite. One thing you could so is write `foreach my $word (split $line)` to avoid the extra array, but this is something I suspect the Perl compiler might optimize all by itself already, you have top try whether it makes a difference. hashDescend() as a simple code block would probably be slightly faster, but other than that there's little room for optimization as far as I can see. Is this really too slow for your job?	[reply] [d/l]
Re^2: Rewrite Program Using Arrays by perl.j (Pilgrim) on Mar 25, 2012 at 20:54 UTC
I know it might not make it faster, but it's still a learning experience... Can you tell me how to do it? --perl.j	[reply]
Re^3: Rewrite Program Using Arrays by aaron_baugher (Curate) on Mar 26, 2012 at 00:35 UTC
Well, you could get rid of your loop by using map, but my benchmark shows no gain there. Eliminating some intermediate variables does speed things up, though. If you get rid of `@words` and `$word` completely, you'll gain some time, regardless of which looping method you use: #!/usr/bin/env perl use Modern::Perl; use Benchmark qw(:all); # create a line with words and spaces to split on my $line = ''; $line .= ('a', 'b', 'c', 'd', ' ')[rand(5)] for (1..1000); cmpthese( 100000, { 'loop with vars' => \&loopwithvars, 'loop sans vars' => \&loopsansvars, 'with map' => \&withmap, }); sub loopwithvars { my %wordcount; my @words = split ' ', $line; for my $word (@words){ $wordcount{$word}++; } } sub loopsansvars { my %wordcount; for (split ' ', $line){ $wordcount{$_}++; } } sub withmap { my %wordcount; map { $wordcount{$_}++ } split ' ', $line; } ###### results ###### Rate loop with vars with map loop sans vars loop with vars 11338/s -- -15% -17% with map 13369/s 18% -- -2% loop sans vars 13624/s 20% 2% -- [download] Aaron B. My Woefully Neglected Blog, where I occasionally mention Perl.	[reply] [d/l] [select]
Re^4: Rewrite Program Using Arrays by ikegami (Patriarch) on Mar 27, 2012 at 20:54 UTC
Re^3: Rewrite Program Using Arrays by mbethke (Hermit) on Mar 25, 2012 at 23:43 UTC
Hm, I suppose you meant something like this: `my @globalarray; foreach my $filename (@ARGV) { open my $fh, '<', $filename or die "$filename: $!"; my @temparray = <$fh>; push @globalarray, @temparray; close $fh; }` [download] Guess you could learn the use of "push", that's about the only thing. It's used to append a scalar or a whole list to the end of an array. "pop" does the opposite and takes one scalar off. shift and unshift work the same, just on the start of the array.	[reply] [d/l]
Re: Rewrite Program Using Arrays by Anonymous Monk on Mar 25, 2012 at 16:26 UTC
But now I want to change it up. I am going to be running this program in a huge repeating loop, so speed counts. So I want to make this faster. And do what, what will this new program do?	[reply]
Re: Rewrite Program Using Arrays by educated_foo (Vicar) on Mar 25, 2012 at 18:05 UTC
Why did you put this useless bit of superstition at the top? `use 5.12.4;` One way to likely make your program faster would be to use an older Perl, like 5.6. But your superstition prevents that.	[reply] [d/l]
Re^2: Rewrite Program Using Arrays by chromatic (Archbishop) on Mar 26, 2012 at 00:46 UTC
How about to document the version of Perl on which he tested it?	[reply]
Re^3: Rewrite Program Using Arrays by Jenda (Abbot) on Mar 26, 2012 at 09:45 UTC
In that case I'd prefer to see that in the documentation or a comment. By "using" a version number you are not merely saying "I've tested this only with this particular version." What you are effectively saying is more like "I believe this will not work with anything older than X." If I see "use 5.12.4;" I assume there was something wrong with even 5.12.3, something that prevents the code to work correctly. So probably I would not bother trying to run it under 5.10.x. On the other hand with a comment about the version tested under, I would try it and only if I end up getting an error I'd take the possible version differences into account. Jenda Enoch was right! Enjoy the last years of Rome.	[reply]
Re^4: Rewrite Program Using Arrays by chromatic (Archbishop) on Mar 26, 2012 at 16:14 UTC
Re^4: Rewrite Program Using Arrays by perl.j (Pilgrim) on Mar 26, 2012 at 10:23 UTC
Re^5: Rewrite Program Using Arrays by JavaFan (Canon) on Mar 26, 2012 at 10:54 UTC
Some notes below your chosen depth have not been shown here
Re^2: Rewrite Program Using Arrays by Anonymous Monk on Mar 25, 2012 at 19:03 UTC
say/state/switch, duh	[reply]
Re^3: Rewrite Program Using Arrays by JavaFan (Canon) on Mar 25, 2012 at 19:11 UTC
I don't think the OP is using any of them.	[reply]
Re^4: Rewrite Program Using Arrays by Anonymous Monk on Mar 25, 2012 at 21:55 UTC
Re^5: Rewrite Program Using Arrays by JavaFan (Canon) on Mar 26, 2012 at 08:44 UTC

Back to Seekers of Perl Wisdom