Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

The results are in!

Thank you all very much for your suggestions.

I tried my hand at writing an iterative version, but it didn't perform as well as I'd hoped. Can anyone suggest a better implementation?

Not surprisingly, the version which just deletes keys from the source hash is by far the fastest.

More surprising to me was that the recursive version is quite a lot faster than both the map/grep and slice based solutions.

I didn't spend much time trying to get mr_ron's version working. I may take a closer look at that one later this weekend.

Let me know your thoughts!

#! /usr/bin/env perl use strict; use warnings; use Carp; use Test::More; use Benchmark qw(timethese cmpthese); use Deep::Hash::Utils qw(reach nest deepvalue); my $t = { source => { f1 => 'garbage', f2 => 'more garbage', f3 => 'important data', f4 => { this => 'sub hash', is => 'garbage' }, f5 => { f6 => 'more important data', f7 => { more => 'garbage', f8 => 'important data', }, f9 => 'garbage', }, f10 => [ 'important', 'data' ], f11 => [ 'more', 'garbage' ] }, filter => { f3 => 1, f5 => { f6 => 1, f7 => { f8 => 1 } }, f10 => 1 }, output => { f3 => 'important data', f5 => { f6 => 'more important data', f7 => { f8 => 'important data', } }, f10 => [ 'important', 'data' ], }, }; sub hash_filter_recursive { my $source = shift; my $filter = shift; my %output; foreach ( keys %$filter ) { if ( exists $source->{$_} ) { if ( ref $filter->{$_} eq 'HASH' ) { croak "bad filter: on '$_', expected HASH\n" unless ( ref $source->{$_} eq 'HASH' ); $output{$_} = hash_filter_recursive( $source->{$_}, $filter->{ +$_} ); } else { $output{$_} = $source->{$_}; } } } return \%output; } sub hash_filter_iterative { # I expected this one to be faster... # Can anyone suggest a better implementation? my $source = shift; my $filter = shift; my $output = {}; my @queue = ( [ $source, $filter, $output ] ); while ( my $a = shift @queue ) { my ( $s, $f, $o ) = @{$a}; foreach ( keys %$f ) { if ( exists $s->{$_} ) { if ( ref $f->{$_} eq 'HASH' ) { croak "bad filter: on '$_', expected HASH\n" unless ( ref $s->{$_} eq 'HASH' ); $o->{$_} = {}; push @queue, [ $s->{$_}, $f->{$_}, $o->{$_} ]; } else { $o->{$_} = $s->{$_}; } } } } return $output; } sub hash_filter_delete { my $source = shift; my $filter = shift; foreach ( keys %$source ) { if ( exists $filter->{$_} ) { if ( ref $filter->{$_} eq 'HASH' ) { croak "bad filter: on '$_', expected HASH\n" unless ( ref $source->{$_} eq 'HASH' ); hash_filter_delete( $source->{$_}, $filter->{$_} ); } } else { delete $source->{$_}; } } return $source; } sub hash_slice { # contributed by Veltro my $s = $_[0]; my $f = $_[1]; my $n = {}; my @keysToGet = keys %{$f}; @{$n}{@keysToGet} = @{$s}{@keysToGet}; foreach ( keys %{$n} ) { if ( ref $n->{$_} eq 'HASH' ) { $n->{$_} = hash_slice( $s->{$_}, $f->{$_} ); } } return $n; } sub map_grep { # contributed by shmem my $source = shift; my $filter = shift; return { map { ref $filter->{$_} eq 'HASH' ? ref $source->{$_} eq 'HASH' ? ( $_, map_grep( $source->{$_}, $filter->{$_} ) ) : croak "bad filter: on '$_', expected HASH\n" : ( $_, $source->{$_} ) } grep { exists $source->{$_} } keys %$filter }; } sub map_grep_2 { # contributed by shmem return { map { ref $_[1]->{$_} eq 'HASH' ? ref $_[0]->{$_} eq 'HASH' ? ( $_, map_grep_2( $_[0]->{$_}, $_[1]->{$_} ) ) : croak "bad filter: on '$_', expected HASH\n" : ( $_, $_[0]->{$_} ) } grep { exists $_[0]->{$_} } keys %{ $_[1] } }; } sub hash_deep_utils { # contributed by mr_ron my ( $source, $filter ) = @_; my %rc; while ( my @l = reach($filter) ) { pop @l; if ( defined( my $source_val = deepvalue( $source, @l ) ) ) { # hint: work around nest behavior on even vs odd key count nest( \%rc, @l )->{ $l[$#l] } = $source_val; } } \%rc; } is_deeply( $_->( $t->{source}, $t->{filter} ), $t->{output} ) foreach ( \&hash_filter_recursive, \&hash_filter_iterative, \&hash_filter_delete, \&hash_slice, \&map_grep, \&map_grep_2, \&hash_deep_utils ); done_testing(); cmpthese( 1000000, { recursive => sub { hash_filter_recursive( $t->{source}, $t->{filt +er} ) }, iterative => sub { hash_filter_iterative( $t->{source}, $t->{filt +er} ) }, delete => sub { hash_filter_delete( $t->{source}, $t->{filt +er} ) }, map_grep => sub { map_grep( $t->{source}, $t->{filt +er} ) }, map_grep_2 => sub { map_grep_2( $t->{source}, $t->{filt +er} ) }, slice => sub { hash_slice( $t->{source}, $t->{filt +er} ) }, } );

ok 1 ok 2 ok 3 ok 4 ok 5 ok 6 not ok 7 # Failed test at ./benchmark.pl line 209. # Structures begin differing at: # $got->{f5}{f7} = 'f8' # $expected->{f5}{f7} = HASH(0x7fd932364b88) 1..7 Rate iterative slice map_grep map_grep_2 recursive + delete iterative 191571/s -- -3% -20% -23% -30% + -57% slice 198413/s 4% -- -17% -20% -27% + -56% map_grep 240385/s 25% 21% -- -3% -12% + -46% map_grep_2 248139/s 30% 25% 3% -- -9% + -45% recursive 271739/s 42% 37% 13% 10% -- + -39% delete 448430/s 134% 126% 87% 81% 65% + -- # Looks like you failed 1 test of 7.

Best,

Jim

πάντων χρημάτων μέτρον έστιν άνθρωπος.


In reply to Re: A more elegant way to filter a nested hash? Benchmarks! by jimpudar
in thread A more elegant way to filter a nested hash? by jimpudar

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (7)
As of 2024-03-28 10:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found