http://qs321.pair.com?node_id=593055

sfink has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to make a nicer API for a work application without slowing it down much at all. For this, it would be really convenient if I could detect whether a variable has been modified. For example, consider the following code:
sub onTick { our $x; # Initialize and associate $x with some external value foreach (@kabluther) { $x += $_->getSkookiness(); } return $x; }
Now say that @kabluther is almost always empty, but when it's not, it has a bazillion entries. (Similarly, I could have used an 'if' statement where $x is modified in only one branch.) Is there any tricky way to detect whether $x was modified, preferably without slowing things down much at all? (Or maybe it only slows down that first modification, but then runs at full speed for the remaining.)

In my application, if $x is updated, then I need to notify watchers, and that may trigger all kinds of cascading computation. So I'd like to only do that notification when it's actually needed, without adding a burden to the programmer to explicitly inform the system.

It's fine if it isn't 100% correct -- it's ok to occasionally claim that $x has been modified when it actually hasn't. It is not fine to claim that $x has not been modified when it actually has.

The mechanism shouldn't require the code to be written any differently, although I can automatically insert code before or after the call to the function.

One option would be to tie $x to something that overloads STORE to mark $x as modified, then unties $x. But (assuming it's even possible) I don't want to slow down reads of $x either.

What I was originally thinking of when I asked this question was, for a numeric variable, upgrading it to a string if necessary and then turning off SvNOK. When the variable is modified, it would need to convert the string to a float and turn SvNOK back on. Which is a dumb idea, because it would do the same thing on a read.

It's fine to use XS. Is there separate get and set magic, so that I could do what I wanted with the tie? As in: don't have any get magic, and have the set magic record the variable in a separate table and then clear the magic.

I don't actually know how speed-critical this is -- not even enough to know whether it would be better to always assume the variable was updated, or tie'ing the variable to something that records the update (and doesn't untie it). This is infrastructure that a bunch of other people will use, so I can't just benchmark one use of it and decide from that. The cost of incorrectly assuming that the variable has been updated varies from practically zero to infinite, but is usually small.

I'm hoping someone else has a simple but effective idea that is cheap enough that it will barely slow down the common case while significantly speeding up the occasional case when incorrectly assuming the variable changed is very expensive.

Replies are listed 'Best First'.
Re: Quickly detecting variable writes
by BrowserUk (Patriarch) on Jan 05, 2007 at 02:41 UTC

    I'm probably missing something, but why not take a copy of $x before the loop and check if it has changed afterwards?

    sub onTick { our $x; my $copyX = $x; # Initialize and associate $x with some external value foreach (@kabluther) { $x += $_->getSkookiness(); } if( $x != $copyX ) { notifyWatchers( $x ); } return $x; }

    That wouldn't detect serial changes that result in no change--$x += 10; $x += -4; $x += -6;--but whether that is a problem depends upon what the watchers are watching?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Quickly detecting variable writes
by Tanktalus (Canon) on Jan 05, 2007 at 04:24 UTC

    Your example may be a bit more abstract than you intended, but, in case it is not, why not just test the array?

    foreach (@kabluther) { #... } if (@kabluther) { # the "..." above was executed # i.e., your $x was modified $_->change_in_x() for @listeners; # sounds like this is something li +ke what you want to do. }
    Note that this test is executed once, and only once, each time through this function. The test should be O(1) - that is, independent of the actual size of the list. Even if perl didn't keep track of the actual size of the list separate from the list itself (i.e., not a calculation), which it does, perl would be able to optimise this by noting we're in boolean context and knowing to just check if the list is empty or not, and thereby only count the first element in the list.

    Note that this does not (quite) help if you're doing the $x modification inside an "if", e.g.:

    foreach (@kabluther) { $x += $_->getSkookiness() if $_; # ignore undef arguments # or if ref $_; # only count ref's # or if ref $_ and $_->can('getSkookiness'); # only count objects t +hat have this method }
    But, even then, based on your original question ('usually empty'), and your fuzzy operator ('can report true when there is no actual change, but cannot report false when there is a change'), I think this may be close enough.

    For more accuracy, I would just create a lexical variable to track changes.

    sub onTick { our $x; my $changed; # Initialize and associate $x with some external value foreach (@kabluther) { do { # these two lines must be a block - do together, or not at +all. $x += $_->getSkookiness(); $changed++; } # if clause can go here I think, if needed } if ($changed) { $_->ticked() for @listeners; } return $x; }
    A bit more overhead when doing something, but about the same overhead when doing nothing, and will get it right all the time, and no funky XS. That said, it requires a change to the loop. Probably not a huge change, but a change nonetheless.

      The purpose of this is to provide a nice interface to programmers so that they don't have to track whether the watchers need to be notified or not. Here's some actual code in the current version of my system:
      onTime { our $active; our $pos : Attr('position'); if ($active) { $pos += [ 1, 1 ]; } }
      (I know there's a missing 'sub' before 'onTime', but that's intentional to match our actual setup. Just pretend there is a 'sub' there.)

      I would like the developers to be able to write exactly the above code, but only have the 'position' watchers notified if $pos was written to. I can do whatever setup I want in either the attribute handler or code that is inserted before the entire body, and whatever cleanup or change notification is needed at the end of the body or after the subroutine returns.

      In the current system, it just unconditionally assumes that any variable that could change, did. That means there is a cost for using the Attr(...) syntax, and I want it to at least feel "free" to the developers.

      Sorry if the question was unclear. It's hard to figure out how much to explain so that people can understand the question (and maybe detect if I'm asking the wrong question), and how much to leave out so people don't have to wade through irrelevant detail.

        it looks like the += operator is already overloaded. can you hook into that, and an overloaded = operator, and etc?
Re: Quickly detecting variable writes
by GrandFather (Saint) on Jan 05, 2007 at 02:05 UTC

    Does it make sense to turn $x into an object that stores a copy of the entry value that can be compared with the exit value? The inserted code then calls a mark member before and a test member afterwards.


    DWIM is Perl's answer to Gödel
Re: Quickly detecting variable writes
by sfink (Deacon) on Jan 05, 2007 at 04:34 UTC
    Sorry, I can't use value comparisons to detect the updates. It doesn't matter whether the value changes; it only matters whether the variable was written to.

    Good idea though. In most cases, it would probably still work ok. But it won't work in the general case for my application.

      It doesn't matter whether the value changes; it only matters whether the variable was written to.

      I'm gonna regret persisting, but why?

      I'm not trying to be obnoxious, the answer to that question would really determine the appropriate solution. I'm not fond of pat answers, but this really does seem to be a case of the 'XY problem' tag that was all rage around here a month or so ago.

      You say the value of $x doesn't matter, only whether it is written to but

      • In your example, unless += is being overloaded, $x will always be "written to" if @kabluther contains anything--even if the value added is 0.

        In which case, knowing whether $x 'was written to', is equivalent to knowing 'if @kabluther was non-empty'.

      • If += is overloaded, then you're already paying the penalty of overloading and adding a flag internally to the overload method and a second method to query the flag will add negligable extra overhead.

      Ultimately, you already mentioned the 'obvious' solution when you mentioned tie. You mention efficiency fears, which can be well-founded, but if you really need the functionality, then whether the test is hidden behind a tie, or whether you assigned the return from the methods to a local value, tested it, set a flag and then conditionally added it to $x, the overhead is pretty much going to be the same.

      One possibility is that you fear tying $x to achieve this particular piece of functionality because of the performance affect it will have upon the use of $x throughout the rest of the program?

      In which case, use tied lexical variable local to the sub and assign/add is final value to $x when the loop is complete--if it has 'been written to'. That would avoid any penalty of tying $x in the wider scope.

      However, if you continue to use += on the tied variable, the STORE method will always be called as mentioned above, so you still won't know whether the there is any contribution from the return value of the methods called or not.

      Definitely an XY problem.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        In your example, unless += is being overloaded, $x will always be "written to" if @kabluther contains anything--even if the value added is 0. In which case, knowing whether $x 'was written to', is equivalent to knowing 'if @kabluther was non-empty'.
        Yes. And in that particular case, that is exactly what I want.
        If += is overloaded, then you're already paying the penalty of overloading and adding a flag internally to the overload method and a second method to query the flag will add negligable extra overhead.
        Oops, sorry, bad example. Yes, you are correct. But that depends on the type of value being used. Values could be overloaded blessed array refs, as in the example above. Those are used for 2d, 3d, and 4d vectors. But other common types are booleans, integers, floats, doubles, and strings, as well as a bunch of more heavyweight types (images, vectors, sets, ...) None of those other lightweight types are overloaded or otherwise special, and they are probably the more important ones to suppress the false updates for. Especially booleans, which tend to be used as triggers.

        To answer your question: my system is comprised of a tree of nodes, where each node contains a collection of attributes. Attributes can be either inputs or outputs. All processing is done by connecting the inputs and outputs together. An external trigger (clock ticks, camera frames, keyboard events, ...) causes a cascade of updates throughout the graph.

        Attributes that are bound together share a value, so the value is communicated from one node to the next. But in addition, there is an explicit call saying "I updated this attribute" that invokes the callbacks on all input attributes that are bound to that output. So you can think of it as a graph where the edges propagate both values and control (in the form of update notifications).

        One type of node is a Perl script node. It can have whatever attributes it likes, and can bind any of them to Perl variables using the ": Attr(...)" syntax in the example in another post, or by a lower-level notation to retrieve the value which doesn't try to do any magical auto-updating stuff.

        So the decision of what value to assign to some attribute is different than whether to mark it as updated. Usually, the two go hand in hand. But there are many cases where triggering unnecessary update notifications will result in a huge amount of wasted work. So, for example, if a Perl script node has six boolean output attributes, you really don't want to mark every one of them as updated every time anything changes. And you can't tell from the value; it is perfectly reasonable to repeatedly set an output attribute to 'false' and mark it as updated over and over again.

        You always have the option of dropping back to a lower level, and marking every update explicitly. I am trying to come up with an easier-to-use interface that removes the need for the marking in most cases, preferably without adding too much overhead. The basic idea is that 95% of the time, if you write to a variable then you want to mark it updated. Most of those writes will end up changing the variable's value, but the cost of missing an update (the rest of the tree does not get triggered) is much worse than the cost of doing an unneeded updated (usually just redundant processing), so value comparisons are conservative in the wrong direction.

        Looking at the length of this reply, I think you may have been correct to regret persisting!