Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Being more Assert-ive with Perl

by stvn (Monsignor)
on Sep 17, 2004 at 14:03 UTC ( [id://391780]=perlmeditation: print w/replies, xml ) Need Help??

I like my code to "work" the way I expect it to, but I also like it to "not work" the way I expect it to as well.

When I test my modules, I test with correctly formed input and assure that it behaves as it is expected (and documented) to do. I also test with incorrectly formed input and assure that it behaves as expected (and documented) as well. I want to be sure that if I am expecting you to pass a hash-ref, that I will never accept an array-ref in its place. It is not enough to have perl complain the first time I try to do $arg->{"key"} on an array ref. I want to catch things prior to that, and not allow any part of my methods to be run unless it receives the proper input.

This is a very small part of an idea called Design By Contract. Design By Contract came from the language Eiffel, in which it is a compiler option which can enable and disable the checking of post and pre-conditons for methods as well as class invariants. There is a perl implementation of Design By Contract called Class::Contract, and one for just subroutines in Sub::Assert. Both require either a wrapping of or a controlled generation of subroutines and methods and surely have some serious overhead involved. Personally, I very much like the idea of Design By Contract, because it helps me to feel confident in that my code works as I expect it to, but I don't like the idea that I had to sacrifice performance for confidence. Which brings me to the point of my meditation, that I developed a style of coding of which allows me to do efficient pre and post-condition checking on my methods.

One of my first attempts at this was to implement this with assert_* subroutines. However, this creates the overhead of an extra subroutine call and I didn't like how that slowed things down. I then tried just doing your normal if/unless blocks, but I found that my conditions were easily mistaken for program logic, and I really wanted them to be more distinct since they really were not part of the algorithm itself. Statement modifiers were another option, but they too got easily confused with the program text, and the conditional always ended up at the end of the line, rather then beginning and I wanted the conditionals to be the main focus when reading the code. I finally came upon the use of raw boolean expressions and the using the || operator as a short-circuit. Now this is no revolutionary perl idiom, we have all seen and used it for things like opening files.

open(FILE "<file.dat") || die "cannot open file";
Its simplicity and elegance had always made this idiom a favorite of mine, so I thought, why not adapt it for pre-post-conditions. Here is an example:
sub insertQuarter { my ($coin) = @_; (defined($coin) && $coin == 25) || die "You must insert 25 cents"; initializeGame(); startGame(); }
Surely our $coin must be defined and it must be equal to 25 (some liberties taken in the simplification here), if not, then we cannot continue with our subroutine since our input is bad. Personally I choose to throw an exception, which some might find harsh, but is can just as easily be re-coded in a different way:
# just return a false value and allow the # calling code to deal with the false return (defined($coin) && $coin == 25) || return 0; # warn the user of the wrong input and return # false, leaving the calling routine dealing # with things (defined($coin) && $coin == 25) || (warn("You must insert 25 cents"), +return 0); # use a custom error handler which can record # call stack information, set error variable, # and return a specific value. (defined($coin) && $coin == 25) || return errorHandler("You must inser +t 25 cents");
There are any number of combinations available here, and all of which enjoy what I see as two important benefits to this style; speed and readability.

Readability to me is obvious, our condition is simply a boolean expression which can be read from left to right with relative ease (for those who read left to right of course). The conditional is first in the statement, and is made more distinct by being a bare expression and starting with a paren and therefore separating it visually from the function's body code. Of course I realize this observation might be a little subjective, and many may not agree with me, however the second quality, speed, I can prove.

To help prove this, I created a benchmark script which tests the various ways in which pre-conditons can be checked.

#!/usr/bin/perl use strict; use warnings; use Benchmark qw(:all); sub IfBlocks { my ($test) = @_; if ($test < 50) { die "Test is wrong"; } return $test; } sub UnlessBlocks { my ($test) = @_; unless ($test > 50) { die "Test is wrong"; } return $test; } sub IfStatement { my ($test) = @_; die "Test is wrong" if ($test < 50); return $test; } sub UnlessStatement { my ($test) = @_; die "Test is wrong" unless ($test > 50); return $test; } sub Assert { my ($test) = @_; ($test < 50) || die "Test is wrong"; return $test; } sub Assert2 { my ($test) = @_; ($test > 50) && die "Test is wrong"; return $test; } my @nums = map { ((rand() * 100) % 50) } (0 .. 50); cmpthese(10000, { 'IfBlocks' => sub { eval { IfBlocks($_) } for (@nums +) }, 'UnlessBlocks' => sub { eval { UnlessBlocks($_) } for (@nums +) }, 'IfStatement' => sub { eval { IfStatement($_) } for (@nums +) }, 'UnlessStatement' => sub { eval { UnlessStatement($_) } for (@nums +) }, 'Assert' => sub { eval { Assert($_) } for (@nums +) }, 'Assert2' => sub { eval { Assert2($_) } for (@nums +) }, });
I then ran this script many times to assure that I got a decent sampling of tests. The results of one of these test runs it below:
Benchmark: timing 10000 iterations of Assert, Assert2, IfBlocks, IfSta +tement, UnlessBlocks, UnlessStatement... Assert: 4 wallclock secs ( 2.38 usr + 0.02 sys = 2.40 CPU) + @ 4166.67/s (n=10000) Assert2: 4 wallclock secs ( 2.36 usr + 0.03 sys = 2.39 CPU) + @ 4184.10/s (n=10000) IfBlocks: 13 wallclock secs ( 7.82 usr + 0.13 sys = 7.95 CPU) + @ 1257.86/s (n=10000) IfStatement: 13 wallclock secs ( 7.69 usr + 0.01 sys = 7.70 CPU) + @ 1298.70/s (n=10000) UnlessBlocks: 12 wallclock secs (10.68 usr + 0.80 sys = 11.48 CPU) + @ 871.08/s (n=10000) UnlessStatement: 11 wallclock secs (11.67 usr + 0.05 sys = 11.72 CPU) + @ 853.24/s (n=10000) Rate UnlessStatement UnlessBlocks IfBlocks IfStateme +nt Assert Assert2 UnlessStatement 853/s -- -2% -32% -3 +4% -80% -80% UnlessBlocks 871/s 2% -- -31% -3 +3% -79% -79% IfBlocks 1258/s 47% 44% -- - +3% -70% -70% IfStatement 1299/s 52% 49% 3% +-- -69% -69% Assert 4167/s 388% 378% 231% 22 +1% -- -0% Assert2 4184/s 390% 380% 233% 22 +2% 0% --
As you can see, using either the || or && as short circuit operators is significantly (almost 3 times) faster than all other options, including statement modifiers. I think the results speak for themselves.

I have been using this style for a few years now, and have found it not only to be very easy maintain, but it can serve as a form of self-documentation on my methods. The guidelines for acceptability of my arguments is directly codified, and properly written die or warn messages can go a long way. You can even use this to substitute for lack of type-checked arguments in perl like this:

sub onlyAcceptFooObjects { my ($foo) = @_; (defined($foo) && ref($foo) && $foo->isa("Foo")) || die "This method only accepts Foo objects"; # ... }

And of course, you don't have to confine this to just pre-conditons, anywhere you need to make an assertion before you can continue your code, this style works. I have many times used this to simplify a complex conditional by performing pre-flight checks on variables before entering the conditional code. This serves to reduce the need to include some edge cases in my conditional logic, therefore making it simpler.

This style may not be for all, but I know that for me, it has given me the benefits of some of the concepts found in things like Design By Contract, without the heavy overhead that sometimes comes with them.

-stvn

UPDATE: minor code change/fix thanks to itub.

Replies are listed 'Best First'.
Re: Being more Assert-ive with Perl
by itub (Priest) on Sep 17, 2004 at 15:07 UTC
    Minor nitpick:

    (defined($coin) && $coin == 25) || warn("You must insert 25 cents"), return 0;

    Should be either

    (defined($coin) && $coin == 25) or warn("You must insert 25 cents"), return 0;

    or

    (defined($coin) && $coin == 25) || (warn("You must insert 25 cents"), return 0);

    Otherwise it will always return. I personally always use or for these kinds of idioms, as it is more likely to have the precedence I want.

      Many thanks, I update the OP too.

      -stvn
Re: Being more Assert-ive with Perl
by ambrus (Abbot) on Sep 17, 2004 at 14:28 UTC
Re: Being more Assert-ive with Perl
by kappa (Chaplain) on Sep 17, 2004 at 14:59 UTC
    It is not enough to have perl complain the first time I try to do $arg->{"key"} on an array ref. I want to catch things prior to that, and not allow any part of my methods to be run unless it receives the proper input.

    Design By Contract does not seem to solve the problem, it only makes language translator complain a little earlier, does it? We need to test the code with both correct and incorrect input values in any case.

      Design By Contract does not seem to solve the problem, it only makes language translator complain a little earlier, does it?

      Yes, but nothing can solve that problem really, since its "user error". I forgot where I heard it, but I have always like the quote: "Complain early and often" when speaking about error handling.

      We need to test the code with both correct and incorrect input values in any case.

      I agree. My point is that you can efficiently create those tests using this style and not have the overhead of full blown conditional blocks.

      -stvn

        Hm. Imagine a tool that takes all our declared contracts and composes a huge test suite that tries to test whether each contract is respected in our code, say, by feeding a series of valid and invalid data and turning off all runtime contract checks.

        E.g. I declare that my complex compute_graviton_phasor_coefficient sub wants to take a positive prime number as argument. Now I'd like to get a test that will run it against random numbers and testing that it dies in the right places.

        And, putting dreams aside, I'm going now to read about specification testing and try Test::LectroTest :)

Re: Being more Assert-ive with Perl
by DrHyde (Prior) on Sep 17, 2004 at 16:04 UTC
    I want to be sure that if I am expecting you to pass a hash-ref, that I will never accept an array-ref in its place.

    Here's one that will make your brane hurt ...

    my $arrayref = [qw(i like pie)]; bless($arrayref, 'HASH');
    If I pass $arrayref to a sub that expects a hashref, how will you tell that it is not what you expect?

    Dealing with this correctly is on my to-do list for Data::Compare. It will probably involve lots of eval evil.

      Yeah, this is a hard one, but IMO, if you do such things, you should expect programs to die horrible deaths because of it, and the blame falls on your code not the modules parameter checking code.

      This is one of the important points of Design By Contract (and all contracts for that matter), you must abide by the provisions of the contract, and if you try to subvert the contract, there should be reprocusions.

      -stvn

      Perhaps instead of using ref you should check out Scalar::Util's reftype?

      antirice    
      The first rule of Perl club is - use Perl
      The
      ith rule of Perl club is - follow rule i - 1 for i > 1

        The only real solution to this problem is to see if you can use something as the type of ref you want:
        sub hashref { eval { \%{$_[0]} } } sub myfunc { hashref($_[0]) or die "myfunc expected a hashref"; }
        reftype won't handle all cases (e.g. deref-overloading or perl4-style glob passing).
Re: Being more Assert-ive with Perl
by water (Deacon) on Sep 18, 2004 at 08:08 UTC
    This post is more for newcomers, than a response to stvn.

    Yes, use assertions. Use boatloads of them.

     "assertions++" x 1000. Assertions rock. You will be surprised at how many bugs they detect.

    And if you can (risk, speed, etc), leave them in and turned on in production code (with an appropriate handler, of course). (After all, these are assertions, and thus should NEVER happen, right?)

    but...

    Unless you're writing time-critical software in perl -- a game, a controller, etc (and one could ask, "errr, why in perl, then?" if that is indeed the case) -- then

    1. The program run time will very likely be dominated by IO, usually disk, db, network, web fetches, etc, and thus the speed difference between  assert (defined $foo, 'defined foo') and  defined $foo || die 'defined foo'; should be irrelevent in real code
    2. Machine time is cheap and Programmer time is expensive -- what is clearest? Which is easiest for others to read?
    3. Making assert a sub makes it Very Easy to change the assert behavior of the whole code base (logging, reponding to errors, correct shutdown in case of Really Bad Errors, etc), whereas the in-line || doesn't give you those options.
    Last point aside, IMHO, it really doesn't matter if a shop opts for  assert vs.  ||, as long as the gang uses the idiom consistently. I personally like  assert, because to me is is more clear, and more flexible, but I also prefer Pepsi to Coke -- many of these things are just individual preference.

    And as for ensuring an entire code base meets explicit coding standards -- well, this is great use of perl testing. just File::Find::Rule or whatever and  Test::More the whole source tree and apply regexps to every module and program, such that code that doesn't meet standard fails the test and 'breaks the build'.

    From what I've read here, stvn is a far more advanced coder than I. This is not a criticism of his post or his method, just a caution to the less-experienced: Premature optimization is the root of all evil .

    Don't worry about speed difference until you know your program is running too slowly, then rationally determine what is creating the bottleneck. And I'd bet you lunch that, unless you have contrived code, the bottleneck won't be your assertions. No way.

    "assertions++" x 1000; "coding_standards++" x 1000; "'worrying about speed of assert vs. ||' -- ";
      water,

      Excellent points, all of them, however I do disagree with a few of them.

      And if you can (risk, speed, etc), leave them in and turned on in production code (with an appropriate handler, of course). (After all, these are assertions, and thus should NEVER happen, right?)

      This I do not disagree with, but instead strongly agree with. After all, if one of your pre-conditons is that you get passed a connected database handle, or a writeable filehandle, would you not want to check that at runtime as well? Assertions and Contracts are not just for debugging, they are ways in which you can make your code more reliable and robust.

      The program run time will very likely be dominated by IO, usually disk, db, network, web fetches, etc, and thus the speed difference between  assert (defined $foo, 'defined foo') and  defined $foo || die 'defined foo'; should be irrelevent in real code

      This is true, assertions will almost never be your bottleneck, but personally I have assertions just about everywhere, which includes methods and functions which need to get called inside tight loops, as well as at time critical portions of code.

      For instance I have a part of a reporting app which spends a lot of time doing a DB query, once I get the results back (up to 30,000+ rows) I need to loop through them all and calculate various values. My query has already taken a long time, so anything more I do, I want it to be fast so that the user doesn't think somethings wrong. I use these fast assertions inside the calcualtion routines and loops to make sure my values are always valid before I start. Doing this has the added benefit of keeping my calculation code simple and fast since it never gets an edge case so it doesnt need to handle them.

      I guess my point is that, yes, IO/Network/DB stuff can take a while and certainly be the likelest bottleneck, but sometimes this means your post-processing code needs to be that much faster.

      Machine time is cheap and Programmer time is expensive -- what is clearest? Which is easiest for others to read?

      Agreed, but IMHO, my way is clearer :)

      You say TOmato, I say TomAto. It's all a matter of style.

      Making assert a sub makes it Very Easy to change the assert behavior of the whole code base (logging, reponding to errors, correct shutdown in case of Really Bad Errors, etc), whereas the in-line || doesn't give you those options

      Not so, a modification of the example above will show that it can be done:

      sub insertQuarter {     my ($coin) = @_;     (defined($coin) && $coin == 25) || errorHandler(You must insert 25 cents");     initializeGame();     startGame(); }
      Where the code for errorHandler can be changed to do just about anything (log, die, warn, etc etc etc). IMO this is just as flexible as an assert subroutine. It is also faster too. Here is some code which benchmarks not only a raw OR against an assert sub, but also benchmakrs using the OR with the errorHandler as well as an example of no-op versions of both errorHandler and assert.

      As you can see from the results, using a basic assert subroutine is about 34% slower than the raw OR. When you add the flexibility of an errorHandler subroutine on the other end of OR, you lose only 1% (which is surely insignifagant), and gain the same flexibility an assert subroutine would have. Even when you turn off assertions and make both the assert and errorHandler routines into no-ops, the assert version is still a 21% slower (which really is useless overhead since much of that is likely just the call to assert).

      Again, I like my assertions to be on at runtime, and I put them everywhere, so for me, these performance gains are a nice thing to have. But even if you don't need to worry about performance, I see little gain in flexibility over using an assert sub.

      This is not a criticism of his post or his method, just a caution to the less-experienced: Premature optimization is the root of all evil .

      Yes, quite true, but would you not want to make sure your always using the sharpest knife in the drawer?

      I do not mean this as a means of premature optimization, but as a style, which IMO, is clear and readable, but also has the added benefit of being faster than many of the other options out there.

      But then again, I hate Pepsi, and I hate Coke,... gimme some Mountain Dew anyday ;-)

      -stvn
Re: Being more Assert-ive with Perl
by autarch (Hermit) on Sep 18, 2004 at 03:27 UTC

    You should check out Params::Validate. It's much less verbose than what you're doing, and it's coded in XS so it's reasonably fast, though perhaps not as fast as your hard-coded tests. Your first example would like this:

    my ($quarter) = validate_pos( @_, { type => SCALAR, callbacks => { 'must be 25' => sub { $_[0] == 25 }, } );

    I think that consistently using this module makes your code even more declarative.

      It's much less verbose than what you're doing,...

      Personally I don't see that. I have looked at Params::Validate before, and I didn't much care for it's style of parameter handling. Even if this is in XS, I would think it wouldnt be faster than a raw boolean. Already in your example there is a subroutine call (validate_pos) as well as the construction and analysis of nested hash ref (which is surely pretty fast if in XS), and then the creation of a code-ref (which I assume brings along the overhead of creating a closure, although I may be wrong in this). This seems to me to be a lot of overhead just to get to the point at which things can be checked.

      I think that consistently using this module makes your code even more declarative.

      I disagree, I think the more declarative approach is to use the basic boolean expression. But that is likely just a matter of personal preference/style.

      -stvn

        >> It's much less verbose than what you're doing,...

        > Personally I don't see that. I have looked at Params::Validate before, and I didn't much care for it's style of parameter handling.

        When you're validating only one parameter, the framework takes up more space than the declarative parts. But as you start adding more parameters the framework pieces grow at a much slower rate than the declarative parts. But with your code, you'll be adding another "or die" for each parameter.

        Anyway, I don't think you have to use it, but I think your optimizations here are seriously premature. You should be using some sort of module for this stuff, not hand-coding it over and over and over, especially since you say you use it everywhere. You could use a source filter if you're really dead set on maximum speed.

      I decided to benchmark this and see what the difference was. Here is the script...

      #!/usr/bin/perl use strict; use warnings; use Benchmark qw(:all); use Params::Validate qw(:all); sub ParamsValidateAssert { my ($test) = validate_pos( @_, { type => SCALAR, callbacks => { 'must be less than 50' => s +ub { $_[0] < 50 }}, } ); return $test; } sub OrAssert { my ($test) = @_; ($test < 50) || die "Test is wrong"; return $test; } my @nums = map { ((rand() * 100) % 50) } (0 .. 100); cmpthese(10_000, { 'ParamsValidateAssert' => sub { eval { ParamsValidateAssert($_) } +for (@nums) }, 'OrAssert' => sub { eval { OrAssert($_) } +for (@nums) }, });
      Here are the results...
      Benchmark: timing 10000 iterations of OrAssert, ParamsValidateAssert.. +. OrAssert: 8 wallclock secs ( 7.02 usr + 0.01 sys = 7.03 CPU) @ 14 +22.48/s (n=10000) ParamsValidateAssert: 97 wallclock secs (87.88 usr + 0.38 sys = 88.26 + CPU) @ 113.30/s (n=10000) Rate ParamsValidateAssert OrAssert ParamsValidateAssert 113/s -- -92% OrAssert 1422/s 1155% --
      It is almost 100% faster to use OR. Again, as I said, I think my style is less verbose and more readable so I favor it becuase of that as well.

      But as water points out below, it should not completely be about the speed. I am sure that Params::Validate has a number of wheels I would not want want to have to re-invent when I hand-code OR based assertions. Looking over the docs for Params::Validate, I see where this could really be a useful tool for data validation, not just of subroutine parameters, but of web queries and such. As you say it can be quite declarative and I can see how you could build some very sophisticated data validation with it. However, I do think it is overkill for subroutine params and post/pre-condition checking.

      -stvn
Re: Being more Assert-ive with Perl
by Zed_Lopez (Chaplain) on Sep 18, 2004 at 00:28 UTC

    You might like to look at Carp::Assert. Its assert function is liable to be slower than your test or die constructs, and it doesn't allow you to customize your die statements (unless you use its longer affirm routine, which is bound to be substantially slower.) I find it nice and readable, though. If I have time, I'll benchmark it later.

    An idea you might like to steal from it is to append if DEBUG to your assertions, where DEBUG is a constant that you can set to 1 in development/testing and to 0 in production. That way, you have all of the benefits of assertions in development, and none of the overhead in production (along with none of the benefits.)

      If I have time, I'll benchmark it later.

      I was already benchmarking for the other responses, so I did it for you.

      Here is the code

      #!/usr/bin/perl use strict; use warnings; use Benchmark qw(:all); use Carp::Assert; sub CarpAssert { my ($test) = @_; assert($test < 50) if DEBUG; return $test; } sub OrAssert { my ($test) = @_; ($test < 50) || die "Test is wrong"; return $test; } my @nums = map { ((rand() * 100) % 50) } (0 .. 100); cmpthese(10_000, { 'CarpAssert' => sub { eval { CarpAssert($_) } for (@nums) }, 'OrAssert' => sub { eval { OrAssert($_) } for (@nums) }, });
      Here is the results of this:
      Benchmark: timing 10000 iterations of CarpAssert, OrAssert... CarpAssert: 11 wallclock secs ( 9.92 usr + 0.04 sys = 9.96 CPU) @ 10 +04.02/s (n=10000) OrAssert: 8 wallclock secs ( 7.14 usr + 0.03 sys = 7.17 CPU) @ 13 +94.70/s (n=10000) Rate CarpAssert OrAssert CarpAssert 1004/s -- -28% OrAssert 1395/s 39% --
      I then tried turning off Carp::Assert, and it was only 3% faster than the OR assertions.
      Benchmark: timing 10000 iterations of CarpAssert, OrAssert... CarpAssert: 9 wallclock secs ( 6.50 usr + 0.03 sys = 6.53 CPU) @ 15 +31.39/s (n=10000) OrAssert: 10 wallclock secs ( 6.72 usr + 0.02 sys = 6.74 CPU) @ 14 +83.68/s (n=10000) Rate OrAssert CarpAssert OrAssert 1484/s -- -3% CarpAssert 1531/s 3% --
      I then changed the OR assertion to use a no-op error handler (as if they were off). Here is the modified sub
      sub noOpError {} sub OrAssert { my ($test) = @_; ($test < 50) || noOpError("Test is wrong"); return $test; }
      And surprisingly it slowed OR down
      Benchmark: timing 10000 iterations of CarpAssert, OrAssert... CarpAssert: 6 wallclock secs ( 6.54 usr + 0.00 sys = 6.54 CPU) @ 15 +29.05/s (n=10000) OrAssert: 7 wallclock secs ( 7.11 usr + 0.02 sys = 7.13 CPU) @ 14 +02.52/s (n=10000) Rate OrAssert CarpAssert OrAssert 1403/s -- -8% CarpAssert 1529/s 9% --

      An idea you might like to steal from it is to append if DEBUG to your assertions, where DEBUG is a constant that you can set to 1 in development/testing and to 0 in production. That way, you have all of the benefits of assertions in development, and none of the overhead in production (along with none of the benefits.)

      Personally I do not like to turn off my assertions, I also see my debug statements as different from assertions. Besides IMO, having the if DEBUG at the end of the statement would really ruin the readability of it all.

      -stvn
Re: Being more Assert-ive with Perl
by muba (Priest) on Sep 23, 2004 at 15:51 UTC
    Nice :) It is readable indeed, but it could even be better. This is just a small tip, I do not want to offend you in any way, but I think
    # warn the user of the wrong input and return # false, leaving the calling routine dealing # with things (defined($coin) && $coin == 25) || (warn("You must insert 25 cents"), +return 0);
    is better written as
    # warn the user of the wrong input and return # false, leaving the calling routine dealing # with things (defined($coin) && $coin == 25) || return do {warn("You must insert 25 + cents"); 0};
    IMHO, that would even be more readable




    "2b"||!"2b";$$_="the question"
      MUBA,

      No offense taken at all. I usually use die or throw an exception object. So the examples with warn and such are ones I made up when writing this. I am sure they could be improved, and I agree the do does make it more readable.

      -stvn

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://391780]
Approved by Arunbear
Front-paged by dvergin
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (7)
As of 2024-04-16 11:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found