Reference assessment techniques and how they fail

or, Ball Bearings in the Trail Mix.

Once upon a time, there were three data types, and they had three sigils. Anything with a "$" in front was a scalar. Anything with a "@" in front was a list. Anything with a "%" in front was a hash. If you want to know what it is, look at the sigil.

Then came references. References are scalars, so they wear the "$" sigil as if there's nothing more to them than a bank account balance, or a line of text. References, however, can be very very complicated.

There are several things you might want to know about a reference.

Is this a reference?
Is this a blessed reference?
What class is this reference in?
What underlying type is it?
How can it be dereferenced?

I'm going to look at these questions, how to answer them, and how a reference might foil attempts to answer them (through code that Should Not Be Written).

The following code put together will form a single mighty Test::More script you can use to check all these things yourself. At the top of that script would be:

use strict;
use warnings;

use Test::More 'tests' => 34;
use Scalar::Util qw( reftype blessed );
[download]

Is this a reference? Is this a blessed reference?

The usual way to check reference-ness is with ref. We write if ( ref $possible_reference ) ... and proceed accordingly. Likewise, Scalar::Util::blessed can tell you if a reference is blessed.

This fails if the reference has been blessed into package '0' like so:

my $package_0 = bless {}, '0';
ok( ! ref $package_0, '! ref $package_0' );
ok( ! blessed $package_0, '! blessed $package_0' );
is( reftype $package_0, ref {}, 'reftype $package_0' );
[download]

Note that this won't fool reftype, but reftype can't tell you if the reference is blessed. The reason package '0' causes this problem is that the string '0' is interpreted as false in a boolean context. Luckily, zero isn't what these functions normally return to mean false, so you can test them more explicitly to get the right answer.

ok( ref $package_0 ne '', 'ref $package_0 ne ""' );
ok( defined blessed $package_0, 'defined blessed $package_0' );
[download]

Also, you're unlikely to run into any real object in package '0' because package 0 is a syntax error, and sub 0::foo {} will say Illegal declaration of anonymous subroutine. In short, a reference in package '0' can't do anything beyond what the reference iself does.

What class is this reference?

If you've determined that the reference is blessed, then ref or Scalar::Util::blessed can tell you what class it is.

Maybe you want to know if the reference is a certain class or a subclass of it. For this, there's UNIVERSAL::isa. Because it's in the UNIVERSAL package, the isa method is available on every blessed reference, so you can say $blessed_ref->isa( 'Class::Foo' ) to see if the $blessed_ref is a "Class::Foo".

package Super1;
package Super2;
package Sub;

@Sub::ISA = qw( Super1 Super2 );

package main;

my $sub_class = bless {}, 'Sub';
foreach my $class ( qw( Sub Super1 Super2 HASH ) ) {
    ok( $sub_class->isa( $class ), "\$sub_class->isa( '$class' )" );
}
[download]

Note that UNIVERSAL::isa will also tell you about the underlying type of the reference ($sub_class->isa( 'HASH' )). This can be faked easily enough:

my $pretend_array = bless {}, 'ARRAY';
is( ref $pretend_array, ref [], 'ref of fake array looks real' );
ok( UNIVERSAL::isa( $pretend_array, 'ARRAY' ),
    'UNIVERSAL::isa thinks it is an array' );
ok( $pretend_array->isa( 'ARRAY' ), '$pretend_array->isa( "ARRAY" )' )
+;
[download]

However, because isa is an instance method invocation, the class can also override it. Not only that, you can redefine UNIVERSAL::isa outright and intercept checks on anything.

package Void;
sub void_sub { 'void sub' }
package Empty;
sub empty_sub { 'empty sub' }
package Nothing;

@Nothing::ISA = qw( Void Empty );

sub isa { 0 }

my $uni_isa;
BEGIN { $uni_isa = \&UNIVERSAL::isa; }
{   
    no warnings 'redefine';  # Subroutine UNIVERSAL::isa redefined

    sub UNIVERSAL::isa {
        ref $_[0] eq __PACKAGE__ ? 0 : goto &$uni_isa;
    }
}
package main;

my @refs = ( ref {}, ref [], ref \do{my $x}, ref sub {}, ref qr// );
my $nothing = bless {}, 'Nothing';
foreach my $reftype ( @refs ) {
    ok( ! UNIVERSAL::isa( $nothing, $reftype ),
        "\$nothing is not $reftype" );
    ok( ! $nothing->isa( $reftype ), "! \$nothing->isa( $reftype )" );
}
ok( UNIVERSAL::isa( [], 'ARRAY' ), 'isa still works' );
ok( ! UNIVERSAL::isa( {}, 'ARRAY' ), 'isa still works (negation)' );
is( $nothing->void_sub(), 'void sub', 'method dispatch still works' );
[download]

To get around that kind of trickery, one can examine the package's @ISA array directly. As a package variable, it's globally accessible.

my @nothing_isa;
{
    no strict 'refs';
    @nothing_isa = @{ ref( $nothing ) . '::ISA' };
}
foreach my $class ( qw( Empty Void ) ) {
    ok( grep( $_ eq $class, @nothing_isa ),
        "\$nothing is a '$class' according to \@ISA" );
}
[download]

This works regardless of what isa says, and it can discover parent classes if you don't already know what they are. One could break this by tieing @ISA to something that lies based on caller, but that seems to break method dispatch. A serious implementation would also have to recursively check the @ISA array of the packages found.

The down side of examining @ISA yourself is that the object in question might have a really good reason for overriding its isa method, and this technique explicitly ignores that.

I think if all you want to know is whether some reference is a member of some class, $r->isa('Some::Class') is the way to go. That's what it's there for, after all, and it really only breaks when someone tries to break it.

What underlying type is this reference?

You can use ref for unblessed references, but it won't work for blessed references, and it can't tell you whether the reference is blessed.

If you want to know whether it's a particular type of reference, UNIVERSAL::isa can help, but see earlier examples for how to fool it too.

In Scalar::Util, there's reftype specifically to pull out the underlying type. I have yet to see a way to completely fool reftype, but see How can I tell if an object is based on a regex? (The short version: blessed regular expressions show as scalars.)

I suspect that most of the time, this is not what one really wants to know anyway. Knowing the type tells you the syntax for how to access the type (i.e., how to dereference it), but it may be possible to access it in ways different from what its type would suggest. A better question is the next one...

How can this be dereferenced?

Since an object can be overloaded to provide reference-like behavior different from its own reference type, it's possible that you can @$obj even though reftype $obj eq 'HASH'. You may want to know, regardless of the underlying type, how a reference can be used. Most of the time, I don't care how an object is implemented as long as I can use the interface I'm expecting.

One method I've seen is just to try it and see if it dies.

package NotHash;
use overload '%{}' => sub { {} };
sub new { bless [], shift }

package main;
my $not_hash = NotHash->new();
ok( ref $not_hash, '$not_hash is a reference' );
is( reftype $not_hash, reftype [],
    '$not_hash is an array reference' );
# Useless use of a variable in void context
ok( eval { %{$not_hash}; 1 },
    '$not_hash can be dereferenced as a hash' );
[download]

Note that this technique can't detect a code reference because it would call the referenced sub. For that, you're back to UNIVERSAL::isa( $obj, ref sub {} ). See Re^5: Is this DBM::Deep behavior, or something with tie/bless? (ref) for a further discussion of this and other ramifications of using eval for this. Also, this implicitly calls methods inside the overloaded object, which may have unintended side effects. Another way to say this is that this technique doesn't just examine the reference; it prods it to see what it does.

A reference that knows it might be tested this way could foil the test by checking whether it's in an eval.

package EvilHash;
use overload '%{}' => \&conditional_reference;
sub conditional_reference {
    my @call = caller(1);
    if ( @call && $call[3] eq '(eval)' ) {
        die 'no testing';
    }
    return { a => 1 };
}
sub new { bless [], shift }

package main;
my $evil_hash = EvilHash->new();
ok( ref $evil_hash, '$evil_hash is a reference' );
is( reftype $evil_hash, reftype [],
    '$evil_hash is an array reference' );
# Useless use of a variable in void context
ok( ! eval { %{$evil_hash}; 1 },
    '$evil_hash will not be dereferenced inside eval' );
ok( scalar %{$evil_hash},
    '$evil_hash allows dereference ouside eval' );
[download]

Another way is examined in "Is it a hashref" vs "Can I use it like a hashref?" This uses a combination of methods from Scalar::Util and overload to determine what the object can do. The vulnerability of that test is that it contains a catalog of reference types. If more types are added later (with support for overloading), the test could go stale.

Summary and recommendations

Here are my brief recommendations based on all the discussion above. Note that I don't think these are the most maintainable or elegant but merely the most reliable.

Say there is a reference $r.

Is this a reference? ref $r ne ''
Is this a blessed reference? defined blessed $r
What class is this reference in? Use blessed $r to see what class it's in. Use $r->isa('Some::Class') to see if it is some specific class, or examine its @ISA directly (and recursively) to see what its parents are when you don't know.
What underlying type is it? reftype $r, but don't expect to find a regex.
How can it be dereferenced? See "Is it a hashref" vs "Can I use it like a hashref?" for a collection of tests using Scalar::Util and overload.

Thanks to blokhead, planetscape, ikegami, and tye for valuable feedback before I posted this.

Back to Meditations