Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Too Many IDs

by kcott (Archbishop)
on Jan 09, 2020 at 08:21 UTC ( [id://11111231]=note: print w/replies, xml ) Need Help??


in reply to Too Many IDs

G'day The_Dj,

Although not stated, I'm assuming all id, sn, etc. values are unique. If that's not the case, neither your current solution nor my alternative suggestion will work properly.

Instead of recreating the entire hash multiple times, consider just having a single hash with all the data and then simple mappings of sn to id (extending for future requirements).

Here's a quick example:

#!/usr/bin/env perl use strict; use warnings; my %dat_by_id = ( 1 => {id=> 1, sn => 'a', more => 'foo'}, 2 => {id=> 2, sn => 'b', more => 'bar'}, ); my %map_sn_to_id = map +($_->{sn} => $_->{id}), values %dat_by_id; print "SN for ID[1]: $dat_by_id{1}{sn}\n"; print "ID for SN[b]: $dat_by_id{$map_sn_to_id{b}}{id}\n"; print "MORE for SN[a]: $dat_by_id{$map_sn_to_id{a}}{more}\n"; # Subsequent requirements, e.g. my %map_more_to_id = map +($_->{more} => $_->{id}), values %dat_by_id; print "ID for MORE[foo]: $dat_by_id{$map_more_to_id{foo}}{id}\n"; print "SN for MORE[bar]: $dat_by_id{$map_more_to_id{bar}}{sn}\n";

Output:

SN for ID[1]: a ID for SN[b]: 2 MORE for SN[a]: foo ID for MORE[foo]: 1 SN for MORE[bar]: b

Having a single data source will reduce the chances of errors and should make maintenance and debugging (if necessary) easier.

I see you've used "map BLOCK LIST" and I'm aware that's considered a Best Practice; however, "map EXPR, LIST" is faster and may make a difference, especially when you're dealing with millions of data elements. Use Benchmark to test. See map for more on these two forms as well as an explanation of the unary plus, "map +(...", I used (if you're unfamiliar with that syntax).

I've only shown a barebones technique. For production usage, I'd suggest setting up a series of functions, e.g. get_id_for_sn($sn), instead of having to continually hard-code an equivalent $dat_by_id{$map_sn_to_id{$sn}}{id}.

— Ken

Replies are listed 'Best First'.
Re^2: Too Many IDs
by The_Dj (Beadle) on Jan 09, 2020 at 13:49 UTC

    Hi, Ken.

    Yeah, I should have mentioned that both sn and id are* unique keys.

    And thanks for the benchmark pointers. I'll keep this in mind when I get to optimizing.

    Sadly I tend to loose most of my runtime to DB calls and some 3rd party .exe's, but I will give map a close scrutiny.

    * Except one isn't really, but is treated as such for other reasons. My job can be 'fun'

      If parts of your application (DB, EXEs, etc.) are taking seconds or milliseconds to run, spending time optimising map to save a few micro- or nanoseconds is unlikely to be worth the effort.

      Another consideration is whether this is a short-lived application that's run multiple times or a long-lived application that's run with multiple iterations.

      There are some JIT (just-in-time) possibilities that may be worth consideration. If you end up with a lot of get_X_for_Y() subs, and some are only called infrequently, you can create mappings just when they're needed; something like this using state (which requires v5.10):

      { my %dat_by_id = ...; sub get_X_for_Y { my ($Y) = @_; state $map_Y_to_id = { map +($_->{Y} => $_->{id}), values %dat_by_id }; return $dat_by_id{$map_Y_to_id->{$Y}}{X}; } sub get_X_for_Z { ... } sub get_V_for_W { ... } ... }

      Note that the anonymous block gives %dat_by_id lexical scope such that it is only visible to the subs; while the subs themselves are visible to, and accessible from, the entire script. This prevents inadvertent changes to %dat_by_id which could introduce bugs which are hard to track down.

      Also note that the placement of the above code within your script would be important. The assignment to %dat_by_id should occur before any calls to get_?_for_?() are made. An alternative to this would be to use some combination of BEGIN, INIT, etc. blocks; I don't know enough about your code to comment further on this; see perlmod: BEGIN, UNITCHECK, CHECK, INIT and END for more about these. There would, no doubt, be other ways to handle this but, as previously stated, I don't know enough about your code to offer further advice.

      — Ken

        Thanks for the reply

        Yes I know that shaving a few milliseconds of map won't help when I loose 4 seconds per external .exe,
        When my code is 'complete', I'll be letting NYTProf do its' thing and then fix the red bits first ;-)

        And thanks for the code samples; Must admit it's nicer than using a global and then //=

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11111231]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (7)
As of 2024-04-25 09:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found