http://qs321.pair.com?node_id=11112496

Eily has asked for the wisdom of the Perl Monks concerning the following question:

Again, some of you might have seen the question on the CB already. (And again, I came up with a solution just because asking made me see things clear enough). I am wondering about iterating over an hash while removing some of its keys (the current one, and several connected ones). Actually, to explain my problem on the CB I said I'd like to be able to do this:

while (my ($k,$v) = first %hash) { delete $hash{$_} for $k, @$v; }
where first does the same as each, but with a new iterator (ie, from the start). Long story short, this led to this:
use strict; use warnings; use feature 'say'; use Data::Dump qw( pp ); sub first (\%) { my $hash = shift; scalar(keys %$hash); # Force reset each %$hash; } my %hash = ( 1 => [2..7], 8 => [ 9 ], 9 => [ 8 ], (map { $_ => [ 1 ] } 2..7), ); say "Input hash:", pp \%hash; say ""; say "Try first then each twice in a row to check that the iteration is + restarted"; say pp [first %hash], [each %hash] for 1..2; say ""; # Delete a key and all the keys contained in the value array sub rec_delete(\%$); sub rec_delete(\%$) { my ($hash, $key) = @_; my $array = delete $hash->{$key}; return unless $array; say " Removing $key -> [ @$array ]"; rec_delete %$hash => $_ for @$array; } while (my ($k,$v) = first %hash) { say "Starting from $k -> [ @$v ]"; rec_delete %hash => $k; }
Input hash:{ 1 => [2 .. 7], 2 => [1], 3 => [1], 4 => [1], 5 => [1], 6 +=> [1], 7 => [1], 8 => [9], 9 => [8] } Try first then each twice in a row to check that the iteration is rest +arted ([8, [9]], [5, [1]]) ([8, [9]], [5, [1]]) Starting from 8 -> [ 9 ] Removing 8 -> [ 9 ] Removing 9 -> [ 8 ] Starting from 5 -> [ 1 ] Removing 5 -> [ 1 ] Removing 1 -> [ 2 3 4 5 6 7 ] Removing 2 -> [ 1 ] Removing 3 -> [ 1 ] Removing 4 -> [ 1 ] Removing 6 -> [ 1 ] Removing 7 -> [ 1 ]
This works just fine. But I'm curious about other ways to do the same (because this seems like it could be a normal enough use of hashes?). So the exercise is:
While the hash isn't empty Take one key (at random) from the hash Delete that key and all connected keys
Without expanding the whole list of keys (let's pretend there's an awful lot of them, or the hash is tied, and it's harder to find an existing key than fetch a value). So @keys = keys %hash; for my $key (@keys) { recursive_delete(\%hash, $key); } isn't valid.

NB: the output of my example is actually random, so you might get a different cycle by trying it.

Replies are listed 'Best First'.
Re: Iterating over an hash while removing keys
by choroba (Cardinal) on Feb 06, 2020 at 16:55 UTC
    You don't have to call scalar. Using a void context resets the iterator without even checking the keys at all.
    - scalar(keys %$hash); # Force reset + keys %$hash; # Force reset

    Update: Documented in keys:

    > In particular, calling "keys" in void context resets the iterator with no other overhead.

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

      That was actually my first version, and I forgot to ask about the behaviour of keys in void context (it was really likely that it didn't expand the list of keys needlessly, but better safe than sorry :) ).

      Thanks!

Iterating over hash while deleting elements
by LanX (Saint) on Feb 06, 2020 at 16:32 UTC
    Eily asked in the CB, here the answer: *

    Basically:

    • each has no problem if you delete the current element
    • if you delete other elements, you need to reset the iterator, e.g. with keys
    • then the iterator starts anew with the remaining elements
    • NB if you don't empty the whole hash, you risk an endless loop.

    The following demo is always resetting, but doesn't need to.

    DB<41> @h{a..c,A..C} = (1..3,11..13) DB<42> x \%h 0 HASH(0x33bf940) 'A' => 11 'B' => 12 'C' => 13 'a' => 1 'b' => 2 'c' => 3 DB<43> while (my ($k,$v) = each %h ) { delete $h{$k}; print "$k,$v\n +"; delete $h{uc($k)} } continue {keys %h } c,3 a,1 B,12 b,2

    Question to follow ;-P

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

    UPDATE

    *) this was a root post which was re-parented to the later question. Partly because it toook eily more than 15 min to ask, but mainly because I liked the idea to defy causality! ;-)

      each has no problem if you delete the current element
      Nice to know but I wouldn't rely on it. I think "don't use each if the hash changes" is an easier rule to follow/understand. And it's safer against possible future changes as well (you might start by only deleting the current key, and then find out you can delete a few more at the same time).

      Question to follow ;-P
      Spooky...

        > Nice to know but I wouldn't rely on it

        It's documented.

        > Exception: It is always safe to delete the item most recently returned by each, so the following code works properly:

        >

        while (my ($key, $value) = each %hash) { print $key, "\n"; delete $hash{$key}; # This is safe }

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery FootballPerl is like chess, only without the dice

Re: Iterating over an hash while removing keys
by haukex (Archbishop) on Feb 06, 2020 at 18:35 UTC
    let's pretend there's an awful lot of them, or the hash is tied, and it's harder to find an existing key than fetch a value

    Well, a pretty foolproof method is to build an array of keys to delete, and delete them later. But by what you said I am guessing this list might grow very long, so that might not be feasible?

    LanX has a point that resetting the iterator on each loop might result in an endless loop if one is not careful. But I think that resetting the iterator only when keys are actually deleted might be an option, e.g.:

    use warnings; use strict; use Data::Dump qw/ dd pp /; my %h = ( 1 => [2..7], 8 => [ 9 ], 9 => [ 8 ], (map { $_ => [ 1 ] } 2..7), ); dd \%h; while ( my ($k,$v) = each %h ) { print "Loop from ",pp($k)," -> ",pp($v),"\n"; rec_delete( \%h, $k ); } dd \%h; sub rec_delete { my ($hash, $key) = @_; return unless exists $hash->{$key}; my $value = delete $hash->{$key}; keys %$hash; return unless $value; print "Recursing ",pp($key)," -> ",pp($value),"\n"; rec_delete( $hash, $_ ) for @$value; }

      Well, a pretty foolproof method is to build an array of keys to delete, and delete them later.
      An array of the initial keys would work. But with the cyclic nature of my data, trying to recurse and push new keys without deleting the ones that have already been processed would be an infinite loop. I could keep a hash of the keys that I want to delete I suppose...

      For the same reason, the fact that first() resets the iterator wouldn't even get the chance to be an issue, because if I didn't delete a key as I got it, the loop wouldn't even finish its first iteration. But LanX's answer and input from choroba did lead me to write this version of the fonction which is saner: it removes the key it returns, so the next call won't return it again.

      That is certainly what I would do in this case. And, if the size of the list becomes a problem, just have it break out of the first loop, get rid of the keys it's found so far, and start over.