Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Re: Re: optimization - exists should return a reference

by MarkM (Curate)
on Jan 15, 2003 at 08:01 UTC ( [id://227078]=note: print w/replies, xml ) Need Help??


in reply to Re: Re: optimization - exists should return a reference
in thread optimization - exists should return a reference

With regard to the suggestion that exists() return a reference, and not a boolean, BrowserUk writes:

I'm really unfamiliar with the perl source, and I still get lost in all the macro's, but looking at the code for Perl_avhv_exists_ent() in av.c, it looks like it might be less work to return the reference than to generate the boolean?

The correct starting place to look for the implementation of Perl op-codes is the pp*.c files. In this case, look in pp.c for a line that reads:

PP(pp_exists)

The function that ends up being called to actually perform the hash lookup is hv.c:hv_exists_ent() (the avhv code still exists in Perl 5.8.0, although the use of avhv's has been deprecated).

The hash itself contains a C "SV *" reference to the value. Perl references, however, are one level above this. A new RV would need to be created using newRV_inc(SV *), and the RV would be returned.

The return value would then need to be dereferenced to fetch data from it, or store data to it. Unless the value is to be accessed frequently, it is questionable whether the allocation of an RV, and the following dereferencing of the RV, from Perl code, would be any cheaper than just looking up the value in the hash again.

Replies are listed 'Best First'.
Re: Re: Re: Re: optimization - exists should return a reference
by BrowserUk (Patriarch) on Jan 15, 2003 at 12:18 UTC

    Thanks MarkM for pointing me in the right direction here. Now I'm gonna bug you a little further with some off-the-cuff source diving... I hope you don't mind.

    The relevant part of pp.c for a hash seems to be the following

    if (SvTYPE(hv) == SVt_PVHV) { if (hv_exists_ent(hv, tmpsv, 0)) RETPUSHYES; }

    Which leads us to hv.c:hv_exists_ent() as you pointed out. The relevant part (ignoring for the moment magical and %ENV variations) seems to be:

    entry = ((HE**)xhv->xhv_array)[hash & (I32) xhv->xhv_max]; for (; entry; entry = HeNEXT(entry)) { if (HeHASH(entry) != hash) /* strings can't be equal */ + continue; if (HeKLEN(entry) != klen) continue; * if (memNE(HeKEY(entry),key,klen)) /* is this it? */ * continue; * return TRUE; }

    With the salient bit being the three lines I've *'d. If I'm reading that correctly it saying

    If the memory for the KEY for this entry is NotEqual to the key we've been passed, continue looking.

    Otherwise we found it so return TRUE.

    Now, if we return TRUE the calling code (pp.c:PP(pp_exists)) does a RETPUSHYES.

    Unwinding that macro we find

    pp.h:#define RETPUSHYES RETURNX(PUSHs(&PL_sv_yes)) pp.h:#define RETURNX(x) return x, PUTBACK, NORMAL pp.h:#define PUSHs(s) (*++sp = (s)) perlapi.h:#define PL_sv_yes (*Perl_Isv_yes_ptr(aTHXo)) pp.h:#define NORMAL PL_op->op_next pp.h:#define PUTBACK PL_stack_sp = sp

    Which preprocesses to

    return ( *++sp = ( &( *Perl_Isv_yes_ptr( aTHXo ) ) ) ) , PL_stack_sp = sp , PL_op->op_next;

    I think that we can ignore the last two lines which are just setting up the stack and returning the next op-code.

    The relevant part here is the address of something located as an offset in the aTHXo struct is being pushed onto the stack.

    Now given that at the point in hv_exists_ent() where TRUE is returned, we have access to the entry for the hash entity being tested, and that given this we can use the HeVAL macro to obtain the address of the associated value.

    #define HeVAL(he)          (he)->hent_val

    If instead of returning TRUE from hv_exists_ent(), if we returned HeVAL(entry), and the code in pp.c:PP(pp_exists) became:

    if (SvTYPE(hv) == SVt_PVHV) { SV* sv; if (sv = hv_exists_ent(hv, tmpsv, 0)) RETURNX(sv); }

    Then the exists function would return a reference to the value as JMD wanted without any great overhead.

    I realised I've only described the exists( hash_element ) case here, but without going into it to far, it looks as though similar changes would take care of the array element case as well. I git lost trying to look at the other cases.

    I also realise that I am quite likely to have missed something trying to unwind this. If I have, I really like someone to point out what it is:^)

    On a somewhat related note: Can anyone tell me why the indentation on the source modules is so... erm... ecclectic?


    Examine what is said, not who speaks.

    The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

      HeVAL(entry) returns the "SV *" pointer that refers to the scalar value in the hash entry.

      In order to create a Perl-style "scalar reference" to the scalar value, hv_exists_ent() would need to return:

      newRV_inc(HeVAL(he))

      The result of this invocation would be a new RV (Perl-style "scalar reference"). The reference count for the hash entry value would be incremented.

      From the Perl side of things, a scalar needs to be allocated to hold the reference, the reference then needs to be assigned to a scalar to hold it, before checking it for "truthfulness". Then, later, in order to fetch the value, or store a new value into the hash value, the reference needs to be dereferenced. For example:

      my $reference_to_value; if ($reference_to_value = exists $hash{$key}) { print "value is $$reference_to_value\n"; }

      The question is, would all this additional overhead actually be any benefit over the common case:

      if (exists $hash{$key}) { print "value is $hash{$key}\n"; }

      A further consideration is the overhead in the case where the return value is only interpretted as a boolean value, and not later used to dereference to access the hash entry value: Since a reference was created, but not used except as a 'truth' value, it then needs to be destroyed.

      So, as you can see, returning more information, while useful under circumstances where the information is effectively used, can reduce the performance of the case where the information is not actually used. Intuitively, this should be expected, although sometimes it needs to be presented in this light to make the situation visible. :-)

      Hash lookups are very fast. I believe that the 'exists() returns a reference' solution is not optimal overall.

      Far better avenues to consider would include caching the string hash value, or enhancing the optimizer to detect the situation where the exists is followed by a lookup, and the key has not changed, and to instead generate new op-codes for the exist() that would store the value away in the pad somewhere, and the lookup that would fetch the value from the pad. The benefit of the latter model would be that code such as the following could be optimized:

      $hash{$key}{'a'} = 'A'; $hash{$key}{'b'} = 'B'; $hash{$key}{'c'} = 'C';

      The optimizer could detect that the value ($key) was not altered between use, and the same hash entry value could be reused each time.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://227078]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (5)
As of 2024-04-24 08:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found