Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Reference assessment techniques and how they fail

by kyle (Abbot)
on Feb 16, 2008 at 21:06 UTC ( [id://668351]=perlmeditation: print w/replies, xml ) Need Help??

or, Ball Bearings in the Trail Mix.

Once upon a time, there were three data types, and they had three sigils. Anything with a "$" in front was a scalar. Anything with a "@" in front was a list. Anything with a "%" in front was a hash. If you want to know what it is, look at the sigil.

Then came references. References are scalars, so they wear the "$" sigil as if there's nothing more to them than a bank account balance, or a line of text. References, however, can be very very complicated.

There are several things you might want to know about a reference.

  • Is this a reference?
  • Is this a blessed reference?
  • What class is this reference in?
  • What underlying type is it?
  • How can it be dereferenced?

I'm going to look at these questions, how to answer them, and how a reference might foil attempts to answer them (through code that Should Not Be Written).

The following code put together will form a single mighty Test::More script you can use to check all these things yourself. At the top of that script would be:

use strict; use warnings; use Test::More 'tests' => 34; use Scalar::Util qw( reftype blessed );

Is this a reference? Is this a blessed reference?

The usual way to check reference-ness is with ref. We write if ( ref $possible_reference ) ... and proceed accordingly. Likewise, Scalar::Util::blessed can tell you if a reference is blessed.

This fails if the reference has been blessed into package '0' like so:

my $package_0 = bless {}, '0'; ok( ! ref $package_0, '! ref $package_0' ); ok( ! blessed $package_0, '! blessed $package_0' ); is( reftype $package_0, ref {}, 'reftype $package_0' );

Note that this won't fool reftype, but reftype can't tell you if the reference is blessed. The reason package '0' causes this problem is that the string '0' is interpreted as false in a boolean context. Luckily, zero isn't what these functions normally return to mean false, so you can test them more explicitly to get the right answer.

ok( ref $package_0 ne '', 'ref $package_0 ne ""' ); ok( defined blessed $package_0, 'defined blessed $package_0' );

Also, you're unlikely to run into any real object in package '0' because package 0 is a syntax error, and sub 0::foo {} will say Illegal declaration of anonymous subroutine. In short, a reference in package '0' can't do anything beyond what the reference iself does.

What class is this reference?

If you've determined that the reference is blessed, then ref or Scalar::Util::blessed can tell you what class it is.

Maybe you want to know if the reference is a certain class or a subclass of it. For this, there's UNIVERSAL::isa. Because it's in the UNIVERSAL package, the isa method is available on every blessed reference, so you can say $blessed_ref->isa( 'Class::Foo' ) to see if the $blessed_ref is a "Class::Foo".

package Super1; package Super2; package Sub; @Sub::ISA = qw( Super1 Super2 ); package main; my $sub_class = bless {}, 'Sub'; foreach my $class ( qw( Sub Super1 Super2 HASH ) ) { ok( $sub_class->isa( $class ), "\$sub_class->isa( '$class' )" ); }

Note that UNIVERSAL::isa will also tell you about the underlying type of the reference ($sub_class->isa( 'HASH' )). This can be faked easily enough:

my $pretend_array = bless {}, 'ARRAY'; is( ref $pretend_array, ref [], 'ref of fake array looks real' ); ok( UNIVERSAL::isa( $pretend_array, 'ARRAY' ), 'UNIVERSAL::isa thinks it is an array' ); ok( $pretend_array->isa( 'ARRAY' ), '$pretend_array->isa( "ARRAY" )' ) +;

However, because isa is an instance method invocation, the class can also override it. Not only that, you can redefine UNIVERSAL::isa outright and intercept checks on anything.

package Void; sub void_sub { 'void sub' } package Empty; sub empty_sub { 'empty sub' } package Nothing; @Nothing::ISA = qw( Void Empty ); sub isa { 0 } my $uni_isa; BEGIN { $uni_isa = \&UNIVERSAL::isa; } { no warnings 'redefine'; # Subroutine UNIVERSAL::isa redefined sub UNIVERSAL::isa { ref $_[0] eq __PACKAGE__ ? 0 : goto &$uni_isa; } } package main; my @refs = ( ref {}, ref [], ref \do{my $x}, ref sub {}, ref qr// ); my $nothing = bless {}, 'Nothing'; foreach my $reftype ( @refs ) { ok( ! UNIVERSAL::isa( $nothing, $reftype ), "\$nothing is not $reftype" ); ok( ! $nothing->isa( $reftype ), "! \$nothing->isa( $reftype )" ); } ok( UNIVERSAL::isa( [], 'ARRAY' ), 'isa still works' ); ok( ! UNIVERSAL::isa( {}, 'ARRAY' ), 'isa still works (negation)' ); is( $nothing->void_sub(), 'void sub', 'method dispatch still works' );

To get around that kind of trickery, one can examine the package's @ISA array directly. As a package variable, it's globally accessible.

my @nothing_isa; { no strict 'refs'; @nothing_isa = @{ ref( $nothing ) . '::ISA' }; } foreach my $class ( qw( Empty Void ) ) { ok( grep( $_ eq $class, @nothing_isa ), "\$nothing is a '$class' according to \@ISA" ); }

This works regardless of what isa says, and it can discover parent classes if you don't already know what they are. One could break this by tieing @ISA to something that lies based on caller, but that seems to break method dispatch. A serious implementation would also have to recursively check the @ISA array of the packages found.

The down side of examining @ISA yourself is that the object in question might have a really good reason for overriding its isa method, and this technique explicitly ignores that.

I think if all you want to know is whether some reference is a member of some class, $r->isa('Some::Class') is the way to go. That's what it's there for, after all, and it really only breaks when someone tries to break it.

What underlying type is this reference?

You can use ref for unblessed references, but it won't work for blessed references, and it can't tell you whether the reference is blessed.

If you want to know whether it's a particular type of reference, UNIVERSAL::isa can help, but see earlier examples for how to fool it too.

In Scalar::Util, there's reftype specifically to pull out the underlying type. I have yet to see a way to completely fool reftype, but see How can I tell if an object is based on a regex? (The short version: blessed regular expressions show as scalars.)

I suspect that most of the time, this is not what one really wants to know anyway. Knowing the type tells you the syntax for how to access the type (i.e., how to dereference it), but it may be possible to access it in ways different from what its type would suggest. A better question is the next one...

How can this be dereferenced?

Since an object can be overloaded to provide reference-like behavior different from its own reference type, it's possible that you can @$obj even though reftype $obj eq 'HASH'. You may want to know, regardless of the underlying type, how a reference can be used. Most of the time, I don't care how an object is implemented as long as I can use the interface I'm expecting.

One method I've seen is just to try it and see if it dies.

package NotHash; use overload '%{}' => sub { {} }; sub new { bless [], shift } package main; my $not_hash = NotHash->new(); ok( ref $not_hash, '$not_hash is a reference' ); is( reftype $not_hash, reftype [], '$not_hash is an array reference' ); # Useless use of a variable in void context ok( eval { %{$not_hash}; 1 }, '$not_hash can be dereferenced as a hash' );

Note that this technique can't detect a code reference because it would call the referenced sub. For that, you're back to UNIVERSAL::isa( $obj, ref sub {} ). See Re^5: Is this DBM::Deep behavior, or something with tie/bless? (ref) for a further discussion of this and other ramifications of using eval for this. Also, this implicitly calls methods inside the overloaded object, which may have unintended side effects. Another way to say this is that this technique doesn't just examine the reference; it prods it to see what it does.

A reference that knows it might be tested this way could foil the test by checking whether it's in an eval.

package EvilHash; use overload '%{}' => \&conditional_reference; sub conditional_reference { my @call = caller(1); if ( @call && $call[3] eq '(eval)' ) { die 'no testing'; } return { a => 1 }; } sub new { bless [], shift } package main; my $evil_hash = EvilHash->new(); ok( ref $evil_hash, '$evil_hash is a reference' ); is( reftype $evil_hash, reftype [], '$evil_hash is an array reference' ); # Useless use of a variable in void context ok( ! eval { %{$evil_hash}; 1 }, '$evil_hash will not be dereferenced inside eval' ); ok( scalar %{$evil_hash}, '$evil_hash allows dereference ouside eval' );

Another way is examined in "Is it a hashref" vs "Can I use it like a hashref?" This uses a combination of methods from Scalar::Util and overload to determine what the object can do. The vulnerability of that test is that it contains a catalog of reference types. If more types are added later (with support for overloading), the test could go stale.

Summary and recommendations

Here are my brief recommendations based on all the discussion above. Note that I don't think these are the most maintainable or elegant but merely the most reliable.

Say there is a reference $r.

  • Is this a reference? ref $r ne ''
  • Is this a blessed reference? defined blessed $r
  • What class is this reference in? Use blessed $r to see what class it's in. Use $r->isa('Some::Class') to see if it is some specific class, or examine its @ISA directly (and recursively) to see what its parents are when you don't know.
  • What underlying type is it? reftype $r, but don't expect to find a regex.
  • How can it be dereferenced? See "Is it a hashref" vs "Can I use it like a hashref?" for a collection of tests using Scalar::Util and overload.

Thanks to blokhead, planetscape, ikegami, and tye for valuable feedback before I posted this.

Replies are listed 'Best First'.
Re: Reference assessment techniques and how they fail
by hipowls (Curate) on Feb 17, 2008 at 00:25 UTC

    Enjoyable article but you underestimate the freedom perl gives you to shoot yourself (or innocent bystanders) in the foot;-). You can, with some effort, declare methods in Package 0.

    use strict; use warnings; use 5.010_000; my $package_0 = bless {}, '0'; { no strict qw(refs); *{'0::hello'} = sub { say 'Hello World!'; }; } say ref $package_0; $package_0->hello(); __END__ 0 Hello World!

Re: Reference assessment techniques and how they fail
by dragonchild (Archbishop) on Feb 17, 2008 at 04:35 UTC
    So, what is your recommendation for the following:
    # I want to know if it will behave as a hash. sub isHash { # What goes here? } sub isScalar {} sub isArray {} sub isFunc {} sub isObject {}
    Remember to also take tying into account. :-)

    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?

      I don't see how a tied hash (ref) is a special case.

      use strict; use warnings; package Nothing; sub isa { 0 } package NotHash; use overload '%{}' => sub { {} }; sub new { bless [], shift } package OverHash; use overload '%{}' => sub { {} }; sub new { bless {}, shift } package main; use Tie::Memoize; use Test::More; my @test_cases = ( { name => 'unblessed hash', test => {}, is_hash => 1, }, { name => 'unblessed array', test => [], is_hash => 0, }, { name => q{hash blessed as 'HASH'}, test => bless( {}, 'HASH' ), is_hash => 1, }, { name => q{array blessed as 'HASH'}, test => bless( [], 'HASH' ), is_hash => 0, }, { name => q{hash blessed as 'ARRAY'}, test => bless( {}, 'ARRAY' ), is_hash => 1, }, { name => 'hash in package 0', test => bless( {}, '0' ), is_hash => 1, }, { name => 'array in package 0', test => bless( [], '0' ), is_hash => 0, }, { name => 'hash with ->isa overridden', test => bless( {}, 'Nothing' ), is_hash => 1, }, { name => 'blessed array with %{} overloaded', test => NotHash->new(), is_hash => 1, }, { name => 'blessed hash with %{} overloaded', test => OverHash->new(), is_hash => 1, }, { name => 'tied hash', test => get_tied_hash(), is_hash => 1, }, { name => 'not a reference', test => 'HASH', is_hash => 0, }, ); sub get_tied_hash { tie my %h, 'Tie::Memoize', sub {}; return \%h; } plan 'tests' => scalar @test_cases; foreach my $test ( @test_cases ) { is( !!isHash( $test->{test} ), !!$test->{is_hash}, $test->{name}); } use Scalar::Util qw( reftype blessed ); sub isHash { my $suspected_hash = shift; return 0 if '' eq ref $suspected_hash; return 1 if 'HASH' eq reftype $suspected_hash; if ( blessed $suspected_hash && overload::Method( $suspected_hash, '%{}' ) ) { return 1; } return 0; }

      The code for isHash is based heavily on blokhead's from "Is it a hashref" vs "Can I use it like a hashref?"

      Have I missed an important test case here?

        %{} is set, but shouldn't respond because nothing has been set.
        my $scalar = 'abcd'; my $obj = Object::MultiType->new( scalar => $scalar );
        The whole tying thing is, for example, DBM::Deep. dbm-deep provides both TIEARRAY and TIEHASH and all appropriate methods in subclasses. Those methods didn't have to be in subclasses. It might be useful to test.

        Also, reftype(), when implemented in pureperl (such as when installed without a compiler), reblesses to a string that shouldn't be a class, but might. That should be tested.

        Frankly, I would be more interested in figuring out how to make sub isHash { eval { %{ $_[0] }; 1 }; } work without warnings. That seems to be a saner solution because it has perl figuring out how Perl works.


        My criteria for good software:
        1. Does it work?
        2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
Re: Reference assessment techniques and how they fail
by chromatic (Archbishop) on Feb 17, 2008 at 06:06 UTC
    Note that this technique can't detect a code reference because it would call the referenced sub.

    It calls the overloaded sub, which has to return a code reference to execute. If you write defined &$some_object, perl will not call the returned code reference, but the expression will evaluate to true.

      Good point, but there's actually another problem I forgot to mention. In a &{}, undef is code.

      use strict; use warnings; use Test::More 'tests' => 3; sub is_code { no warnings qw( void uninitialized ); return eval { defined &{$_[0]}; 1 }; } sub real_sub { die 'real sub called' } my $sub_ref = sub { die 'sub ref called' }; my $undef; ok( is_code( \&real_sub ), 'real sub ref is code' ); ok( is_code( $sub_ref ), 'lexical sub ref is code' ); ok( is_code( $undef ), 'undef is code' );

      This isn't a big deal, I guess. We just have to change is_code to return defined $_[0] && eval { defined &{$_[0]}; 1 };. Still, I'd rather call something that figures out what it is than call it to figure out what it is.

        Good point, but there's actually another problem I forgot to mention. In a &{}, undef is code.

        That's not true. Why is your eval always returning 1 unless it dies? With

        sub is_code { no warnings qw( void uninitialized ); return eval { defined &{$_[0]} }; }

        things look more reasonable. If undef were code, it'd be a serious bug.

        --shmem

        _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                      /\_¯/(q    /
        ----------------------------  \__(m.====·.(_("always off the crowd"))."·
        ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
Re: Reference assessment techniques and how they fail
by Jenda (Abbot) on Feb 18, 2008 at 13:36 UTC

    If anyone blesses a reference to package '0', he deserves what he gets. It's if (ref $r) { and if (blessed $r) { for me. Few people bother with writing for(my $i = $[; $i <= $#arr; $i++) and changing the index of the first element in an array is a more sane thing that blessing into a crazily named package into which you'd have problems adding methods.

    Thanks for the node though, the rest was very interesting!

      Few people bother with writing for(my $i = $[; $i <= $#arr; $i++)

      In Perl 5 $[ is a lexical compiler directive, so you don't have to worry about other users messing with your code through $[.

      lodin

        Note that, unlike other compile-time directives (such as strict), assignment to $[ can be seen from outer lexical scopes in the same file. However, you can use local() on it to strictly bound its value to a lexical block.
               from perldoc perlvar

        So I don't have to worry that someone will change mine $[ from within a different file that I used, but I still might run into problems if someone fiddles with $[ somewhere above my code in the same file.

        @a = (0,1,2,3,4,5); sub pr { print "sub \$a[2]=$a[2]\n"; } print "\$a[2]=$a[2]\n"; pr; { $[ = 1; print "\$a[2]=$a[2]\n"; pr; } print "\$a[2]=$a[2]\n"; pr;

        I refuse to worry anyway.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://668351]
Approved by planetscape
Front-paged by planetscape
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (7)
As of 2024-04-18 07:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found