http://qs321.pair.com?node_id=751041

tlm has asked for the wisdom of the Perl Monks concerning the following question:

Here's a question for the Perl-guts hackers in the audience.

I'm in the process of optimizing some Perl code, and for this purpose I availed myself of Inline::C. The resulting code is indeed faster, but it uses significantly more memory than the original version. Upon further investigation I narrowed the problem to a marked difference in the sizes of array of arrays (AoAs) between those generated by Perl and those generated by the C code. The following script illustrates the problem:

################################################################ # test_aoa.pl ################################################################ use warnings FATAL => 'all'; no warnings 'once'; use strict; use Inline 'C' => Config => OPTIMIZE => '-O2'; use Inline 'C'; use Time::HiRes 'gettimeofday'; my $START = my $ELAPSED = microseconds(); my $MAKE_AOA = !!( shift @ARGV ) ? \&make_aoa_c : \&make_aoa; my $BASELINE = mem_size(); for ( 1..5 ) { start(); my $table = $MAKE_AOA->( 1000, 1000 ); $ELAPSED = elapsed(); printf "%d: %d (%d us)\n", $_, mem_size() - $BASELINE, $ELAPSED; } sub mem_size { chomp( my $size = `ps -o rss= -p $$` ); return $size + 0; } sub make_aoa { my ( $n_rows, $n_cols ) = @_; return [ map [ ( 'foo' ) x $n_cols ], 1..$n_rows ]; } sub microseconds { my ( $sec, $microsec ) = gettimeofday(); return 1E6 * $sec + $microsec; } sub start { $START = microseconds(); } sub elapsed { return $START ? microseconds() - $START : 0; } __END__ __C__ /* get_mortalspace comes from "Extending and Embedding Perl" by Jenness and Cozens, p. 242 */ static void * get_mortalspace ( size_t nbytes ) { SV * mortal; mortal = sv_2mortal( NEWSV(0, nbytes ) ); return (void *) SvPVX( mortal ); } SV *make_aoa_c( int n_rows, int n_cols ) { int i; int n_items = n_rows * n_cols; char *foo = "foo"; SV **table; SV **row_ptr; table = ( SV ** ) get_mortalspace( n_rows * sizeof *table ); row_ptr = ( SV ** ) get_mortalspace( n_items * sizeof *row_ptr ); for ( i = 0; i < n_rows; i++ ) { int j; SV **row = row_ptr; for ( j = 0; j < n_cols; ++j ) { row[ j ] = sv_2mortal( newSVpv( foo, 0 ) ); ++row_ptr; } { AV *av = ( AV * ) sv_2mortal( av_make( ( I32 ) n_cols, row ) ); table[ i ] = sv_2mortal( newRV( ( SV * ) av ) ); } } return newRV( sv_2mortal( ( SV * ) av_make( ( I32 ) n_rows, table ) +) ); }

When given a "false" argument, the script uses the pure Perl function make_aoa to generate an AoA; otherwise it uses the C function make_aoa_c.

When I execute this script, this is what I get:

% perl test_aoa.pl 0 1: 78696 (192169 us) 2: 78700 (211009 us) 3: 78700 (159965 us) 4: 78700 (160709 us) 5: 78700 (167226 us) % perl test_aoa.pl 1 1: 125752 (356172 us) 2: 133572 (310640 us) 3: 133572 (263302 us) 4: 133572 (265807 us) 5: 133572 (267763 us)
As you can see, the AoA generated by make_aoa_c is almost twice the size than the one generated by make_aoa (1.7x the size, to be more precise).

To add insult to injury, my snazzy Inline::C code is much slower too. Fortunately, this is the case only in this little test script. In the real application I'm working on, the move to Inline::C did make a big difference. Still, I would also love to know why my C code is so much slower...

Anyway, after all these years and much reading on the subject, I remain as mystified as ever by the Perl internals, so I'm sure my code in make_aoa_c is doing something pretty clueless. Any words of wisdom would be appreciated.

the lowliest monk