Re: Too Many IDs

G'day The_Dj,

Although not stated, I'm assuming all id, sn, etc. values are unique. If that's not the case, neither your current solution nor my alternative suggestion will work properly.

Instead of recreating the entire hash multiple times, consider just having a single hash with all the data and then simple mappings of sn to id (extending for future requirements).

Here's a quick example:

#!/usr/bin/env perl

use strict;
use warnings;

my %dat_by_id = (
    1 => {id=> 1, sn => 'a', more => 'foo'},
    2 => {id=> 2, sn => 'b', more => 'bar'},
);

my %map_sn_to_id = map +($_->{sn} => $_->{id}), values %dat_by_id;

print "SN for ID[1]:     $dat_by_id{1}{sn}\n";
print "ID for SN[b]:     $dat_by_id{$map_sn_to_id{b}}{id}\n";
print "MORE for SN[a]:   $dat_by_id{$map_sn_to_id{a}}{more}\n";

# Subsequent requirements, e.g.

my %map_more_to_id = map +($_->{more} => $_->{id}), values %dat_by_id;

print "ID for MORE[foo]: $dat_by_id{$map_more_to_id{foo}}{id}\n";
print "SN for MORE[bar]: $dat_by_id{$map_more_to_id{bar}}{sn}\n";
[download]

Output:

SN for ID[1]:     a
ID for SN[b]:     2
MORE for SN[a]:   foo
ID for MORE[foo]: 1
SN for MORE[bar]: b
[download]

Having a single data source will reduce the chances of errors and should make maintenance and debugging (if necessary) easier.

I see you've used "map BLOCK LIST" and I'm aware that's considered a Best Practice; however, "map EXPR, LIST" is faster and may make a difference, especially when you're dealing with millions of data elements. Use Benchmark to test. See map for more on these two forms as well as an explanation of the unary plus, "map +(...", I used (if you're unfamiliar with that syntax).

I've only shown a barebones technique. For production usage, I'd suggest setting up a series of functions, e.g. get_id_for_sn($sn), instead of having to continually hard-code an equivalent $dat_by_id{$map_sn_to_id{$sn}}{id}.

— Ken

Comment on Re: Too Many IDs Select or Download Code

Replies are listed 'Best First'.
Re^2: Too Many IDs by The_Dj (Beadle) on Jan 09, 2020 at 13:49 UTC
Hi, Ken. Yeah, I should have mentioned that both `sn` and `id` are^* unique keys. And thanks for the benchmark pointers. I'll keep this in mind when I get to optimizing. Sadly I tend to loose most of my runtime to DB calls and some 3rd party .exe's, but I will give `map` a close scrutiny. _{* Except one isn't really, but is treated as such for other reasons. My job can be 'fun'}	[reply] [d/l] [select]
Re^3: Too Many IDs by kcott (Archbishop) on Jan 10, 2020 at 01:58 UTC
If parts of your application (DB, EXEs, etc.) are taking seconds or milliseconds to run, spending time optimising `map` to save a few micro- or nanoseconds is unlikely to be worth the effort. Another consideration is whether this is a short-lived application that's run multiple times or a long-lived application that's run with multiple iterations. There are some JIT (just-in-time) possibilities that may be worth consideration. If you end up with a lot of `get_X_for_Y()` subs, and some are only called infrequently, you can create mappings just when they're needed; something like this using state (which requires `v5.10`): `{ my %dat_by_id = ...; sub get_X_for_Y { my ($Y) = @_; state $map_Y_to_id = { map +($_->{Y} => $_->{id}), values %dat_by_id }; return $dat_by_id{$map_Y_to_id->{$Y}}{X}; } sub get_X_for_Z { ... } sub get_V_for_W { ... } ... }` [download] Note that the anonymous block gives `%dat_by_id` lexical scope such that it is only visible to the `sub`s; while the `sub`s themselves are visible to, and accessible from, the entire script. This prevents inadvertent changes to `%dat_by_id` which could introduce bugs which are hard to track down. Also note that the placement of the above code within your script would be important. The assignment to `%dat_by_id` should occur before any calls to `get_?_for_?()` are made. An alternative to this would be to use some combination of `BEGIN`, `INIT`, etc. blocks; I don't know enough about your code to comment further on this; see perlmod: BEGIN, UNITCHECK, CHECK, INIT and END for more about these. There would, no doubt, be other ways to handle this but, as previously stated, I don't know enough about your code to offer further advice. — Ken	[reply] [d/l] [select]
Re^4: Too Many IDs by Anonymous Monk on Jan 13, 2020 at 01:06 UTC
Thanks for the reply Yes I know that shaving a few milliseconds of `map` won't help when I loose 4 seconds per external .exe, When my code is 'complete', I'll be letting NYTProf do its' thing and then fix the red bits first ;-) And thanks for the code samples; Must admit it's nicer than using a global and then `//=`	[reply] [d/l] [select]


laziness, impatience, and hubris
	PerlMonks