Getting real code coverage by instrumenting paths

The following is an expansion and reiteration of a lightning talk at YAPC::NA 2007. Data is available if you follow the link.

I had a problem at work where I had many millions of mostly similar inputs to our search engine and wanted to winnow them down to a set that would exercise our code. We're refactoring the input and validation part of our core search engine. It's a complex piece of code and is too large and complex to be able to manually generate input to fully exercise it.

We settled on the idea of taking representative samples of real searches. The problem becomes a choice of which inputs to select and how far into our logs to search for them. Our first attempts produced mostly satisfactory results except that different sample sets produced different coverage statistics according to Devel::Cover. One afternoon, we had the idea that it would be nice to be able to evaluate each input and keep only the ones that extended our coverage.

This was purely pie in the sky dreaming til I remembered there was something about tracing perl's runloop in the O'Reilly book Perl Hacks. This is Hack #82 entitled "Watch Perl execute individual instructions" in chapter eight. FWIW, I wrote Hack #83 "Write your own warnings" that follows immediately. Send hate mail to jjore at cpan.org.

The module Runops::Trace hadn't ever actually been uploaded to CPAN so I took it upon myself to do this. I had to track the thing down on chromatic's blog. Good thing the URL to the source still worked! It turns out it required only a few minor fixes and I had it running.

The current state is that if you run the function checksum_code_path on your function, you get a MD5 hash of the real path executed by perl. Here's a quick example copied from slides used for YAPC::NA 2007's lightning talk:

sub foo {
  join ' ',
    ( $_[0] < .5 ? 'a' : 'b' ),
    ( $_[1] < .5 ? 'c' : 'd' ),
    ( $_[2] < .5 ? 'e' : 'f' ),
}
[download]

Given 1000 random triplets, I was able to winnow that down to 8 observed code paths.

use Runops::Trace 'checksum_code_path';
for ( 1 .. 1000 ) {
  my @args = ( rand(), rand(), rand() );
  $md5 = checksum_code_path( sub { $results = foo( @args ) } );
  next if $seen{ $md5 } ++;

  push @great, \ @args;
}
__END__
0.16 0.53 0.84 -> b c e
0.03 0.29 0.43 -> b d f
0.30 0.45 0.74 -> b d e
0.58 0.25 0.91 -> a d e
0.60 0.02 0.11 -> a d f
0.47 0.52 0.33 -> b c f
0.51 0.53 0.63 -> a c e
0.59 0.53 0.05 -> a c f
[download]

I was even able to gain some data about the relative frequencies of inputs that would trigger the various code paths. Looking at the data, it ought to be obvious that just picking some amount of random data could easily fail to fully exercise the function.

MD5 checksum for codepath       : count
3486706209c7e59208c012e9fb50a4a8: 65
44a40eaf7295843c7b98c513acbd30cd: 39
49f80e6c0a251c6097828935336b8369: 5
117dd819263a4a0b05ee740190369556: 71
bacb29e1bf8fd4f00d89a1aa1759b18b: 392
ee75b4954053f18a33f00589f2dff817: 28
8a17266d3f92737fe3154c2ee3acd0b0: 7
775a10dbb2d9ce52259680e901999d66: 393
[download]

I found that sifting 1.3 million inputs yielded only 189 inputs that would contribute to code coverage in a way that would be detectable by Devel::Cover. The important thing is, I found an additional twenty-two thousand inputs that exercised unique pathways. This additional set represented inputs that do not influence Devel::Cover but do exercise the code in new and unique ways. You ought to know that Devel::Cover is only able to tell whether you've reached code or not. This new method can tell you about code that is reached in combination. Perhaps your functions foo() and bar() together do something different than if foo() is called multiple times or bar() none at all.

The above image shows the rate of return for trolling the log file. I only acquired 60% of my observed Devel::Cover influencing inputs after 22 thousand inputs. While I wasn't finding much that was new out past several hundred thousand, there were a few. I was still uncovering dead code. The other interesting finding is that I continued to find unique code paths at a constant rate as I walked through the log. About 1.7% of all inputs exercise something in unique ways.

I also generated data on the length of the code paths. Almost everything was 70 thousand opcodes long in duration or less. There were a few things that took radically longer. I haven't looked for the inputs that trigger the egregious code paths but it's possible to find them

sub codelength {
    my $count = 0;
    Runops::Trace::trace_function(
        sub { ++ $count },
        $_[0],
    );
    return $count;
}
[download]

Lastly, I also extracted data on how frequently called the code paths were. Here's a useless little animation.

⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

Comment on Getting real code coverage by instrumenting paths Select or Download Code

Back to Meditations