Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Inefficient code in Camel book?

by cmilfo (Hermit)
on Jan 22, 2002 at 12:28 UTC ( [id://140616]=perlquestion: print w/replies, xml ) Need Help??

cmilfo has asked for the wisdom of the Perl Monks concerning the following question:

On page 18 of the "Camel" book is the following example (Question at bottom):
#!/usr/bin/perl open(GRADES, "grades") or die "Can't open grades: $!\n"; while ($line = <GRADES>) { ($student, $grade) = split(" ", $line); $grades{$student} .= $grade . " "; } foreach $student (sort keys %grades) { $scores = 0; $total = 0; @grades = split(" ", $grades{$student}); foreach $grade (@grades) { $total += $grade; $scores++; } $average = $total / $scores; print "$student: $grades{$student}\tAverage: $average\n"; }

I benchmarked this example against the same example, only changing the @grades assignment line and the foreach $grade line as shown in the following code:
#!/usr/bin/perl open(GRADES, "grades") or die "Can't open grades: $!\n"; while ($line = <GRADES>) { ($student, $grade) = split(" ", $line); $grades{$student} .= $grade . " "; } foreach $student (sort keys %grades) { $scores = 0; $total = 0; foreach $grade (split(" ", $grades{$student})) { $total += $grade; $scores++; } $average = $total / $scores; print "$student: $grades{$student}\tAverage: $average\n"; }

As you can see, removing the @grades assignment and loop are the only changes. Below is the benchmark I performed (100,000 iterations) on a file containing 9 grade entries. Correct me if I am wrong, the 7% performance increase is from avoiding the array assignment. However, I watched the memory usage via a top -d1 command (btw: Is there a module to model memory usage?), and the second routine uses 8-20K more memory (1508-1524K vs. 1528-1532K).

Benchmark: timing 100000 iterations of from_book, from_book2... from_book: 60 wallclock secs (41.60 usr + 4.34 sys = 45.94 CPU) @ 21 +76.75/s (n=100000) from_book2: 55 wallclock secs (39.09 usr + 3.81 sys = 42.90 CPU) @ 23 +31.00/s (n=100000) Rate from_book from_book2 from_book 2177/s -- -7% from_book2 2331/s 7% --


So, I copied the grades flat file 1000 times into a grades.big flat file; the file contains the same 9 entries used above, 1000 times each. When running 1000 iterations of the big file (100,000 was taking a long time), the following benchmark comes out. Memory differences on this are about 36K, (1856K vs. 1892K).

Benchmark: timing 1000 iterations of from_book, from_book2... from_book: 278 wallclock secs (211.20 usr + 0.68 sys = 211.88 CPU) @ + 4.72/s (n=1000) from_book2: 261 wallclock secs (200.35 usr + 0.56 sys = 200.91 CPU) @ + 4.98/s (n=1000) Rate from_book from_book2 from_book 4.72/s -- -5% from_book2 4.98/s 5% --


The question I would like to ask the PerlMonks is why is there more memory usage when the split is implicit in the foreach loop?

Thank you much,
Casey

Edit Masem 2002-01-23 - Changed title from "Page 18 of the Camel"

Replies are listed 'Best First'.
Re: Page 18 of the Camel
by shotgunefx (Parson) on Jan 22, 2002 at 14:02 UTC
    Maybe it's getting localized? The split isn't an array but a list. So it might be keeping two copies of the data.

    Don't know enough about perlguts yet but I would think it something to that effect. Plus the difference grows larger with a larger dataset. My guess anyway.

    -Lee

    "To be civilized is to deny one's nature."
      I would have to agree here - @grades is a global array, so it gets created and allocated a single time. The list in the foreach on the other hand behaves more like my @grades would. I don't know much about perlguts either, but maybe that would be worth adding as a third benchmark to check?

      Makeshifts last the longest.

Re: Page 18 of the Camel
by belg4mit (Prior) on Jan 22, 2002 at 12:36 UTC
    Without being too vague, speed often comes at the expense of memory.

    UPDATE: Consider Memoize.

    --
    perl -pe "s/\b;([st])/'\1/mg"

      Yes, speed comes at the expense of memory. And yes again, I could use Memoize to solve the memory problems.

      Maybe my question is better asked as, "what is different about how the array is stored versus how the implicit split is stored that causes memory usage to increase?"

      It's the same amount of data, how does the storage differ?
        I wasn't saying use Memoize at all. Consider it simply meant look at is an example of the trade-off between cycles and bytes.

        Trying the original form with a scoped @grades might also be interesting.

        --
        perl -pe "s/\b;([st])/'\1/mg"

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://140616]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (3)
As of 2024-04-16 20:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found