Inefficient code in Camel book?

cmilfo has asked for the wisdom of the Perl Monks concerning the following question:

On page 18 of the "Camel" book is the following example (Question at bottom):

#!/usr/bin/perl

open(GRADES, "grades") or die "Can't open grades: $!\n";
while ($line = <GRADES>) {
    ($student, $grade) = split(" ", $line);
    $grades{$student} .= $grade . " ";
}

foreach $student (sort keys %grades) {
    $scores = 0;
    $total = 0;
    @grades = split(" ", $grades{$student});
    foreach $grade (@grades) {
        $total += $grade;
        $scores++;
    }
    $average = $total / $scores;
    print "$student: $grades{$student}\tAverage: $average\n";
}
[download]

I benchmarked this example against the same example, only changing the @grades assignment line and the foreach $grade line as shown in the following code:

#!/usr/bin/perl

open(GRADES, "grades") or die "Can't open grades: $!\n";
while ($line = <GRADES>) {
    ($student, $grade) = split(" ", $line);
    $grades{$student} .= $grade . " ";
}

foreach $student (sort keys %grades) {
    $scores = 0;
    $total = 0;
    foreach $grade (split(" ", $grades{$student})) {
        $total += $grade;
        $scores++;
    }
    $average = $total / $scores;
    print "$student: $grades{$student}\tAverage: $average\n";
}
[download]

As you can see, removing the @grades assignment and loop are the only changes. Below is the benchmark I performed (100,000 iterations) on a file containing 9 grade entries. Correct me if I am wrong, the 7% performance increase is from avoiding the array assignment. However, I watched the memory usage via a top -d1 command (btw: Is there a module to model memory usage?), and the second routine uses 8-20K more memory (1508-1524K vs. 1528-1532K).

Benchmark: timing 100000 iterations of from_book, from_book2...
 from_book: 60 wallclock secs (41.60 usr +  4.34 sys = 45.94 CPU) @ 21
+76.75/s (n=100000)
from_book2: 55 wallclock secs (39.09 usr +  3.81 sys = 42.90 CPU) @ 23
+31.00/s (n=100000)
             Rate  from_book from_book2
from_book  2177/s         --        -7%
from_book2 2331/s         7%         --
[download]

So, I copied the grades flat file 1000 times into a grades.big flat file; the file contains the same 9 entries used above, 1000 times each. When running 1000 iterations of the big file (100,000 was taking a long time), the following benchmark comes out. Memory differences on this are about 36K, (1856K vs. 1892K).

Benchmark: timing 1000 iterations of from_book, from_book2...
 from_book: 278 wallclock secs (211.20 usr +  0.68 sys = 211.88 CPU) @
+  4.72/s (n=1000)
from_book2: 261 wallclock secs (200.35 usr +  0.56 sys = 200.91 CPU) @
+  4.98/s (n=1000)
             Rate  from_book from_book2
from_book  4.72/s         --        -5%
from_book2 4.98/s         5%         --
[download]

The question I would like to ask the PerlMonks is why is there more memory usage when the split is implicit in the foreach loop?

Thank you much,
Casey

Edit Masem 2002-01-23 - Changed title from "Page 18 of the Camel"

Comment on Inefficient code in Camel book? Select or Download Code

Replies are listed 'Best First'.
Re: Page 18 of the Camel by shotgunefx (Parson) on Jan 22, 2002 at 14:02 UTC
Maybe it's getting localized? The split isn't an array but a list. So it might be keeping two copies of the data. Don't know enough about perlguts yet but I would think it something to that effect. Plus the difference grows larger with a larger dataset. My guess anyway. -Lee "To be civilized is to deny one's nature."	[reply]
Re^2: Page 18 of the Camel by Aristotle (Chancellor) on Jan 22, 2002 at 17:43 UTC
I would have to agree here - `@grades` is a global array, so it gets created and allocated a single time. The list in the `foreach` on the other hand behaves more like `my @grades` would. I don't know much about perlguts either, but maybe that would be worth adding as a third benchmark to check? Makeshifts last the longest.	[reply]
Re: Page 18 of the Camel by belg4mit (Prior) on Jan 22, 2002 at 12:36 UTC
Without being too vague, speed often comes at the expense of memory. UPDATE: Consider Memoize. `-- perl -pe "s/\b;([st])/'\1/mg"`	[reply]
Re: Re: Page 18 of the Camel by cmilfo (Hermit) on Jan 22, 2002 at 12:47 UTC
Yes, speed comes at the expense of memory. And yes again, I could use Memoize to solve the memory problems. Maybe my question is better asked as, "what is different about how the array is stored versus how the implicit split is stored that causes memory usage to increase?" It's the same amount of data, how does the storage differ?	[reply]
Re: Re: Re: Page 18 of the Camel by belg4mit (Prior) on Jan 22, 2002 at 13:05 UTC
I wasn't saying use Memoize at all. Consider it simply meant look at is an example of the trade-off between cycles and bytes. Trying the original form with a scoped `@grades` might also be interesting. `-- perl -pe "s/\b;([st])/'\1/mg"`	[reply]


Perl: the Markov chain saw
	PerlMonks