http://qs321.pair.com?node_id=1063084

AlexTape has asked for the wisdom of the Perl Monks concerning the following question:

Omniscient Monks,

i meditate about minimizing keys and values in an hash tree.

data looks like this
[a, b, c] => [1, 2, 3] [b, c] => [2, 3, 4, 5] [a, c, d] => [4, 5, 6, 7] [d] => [1, 2, 6]
basically i got an object containing two lists (e.g. key list and attribute list). now i need an algorithm which is able to reduce needed memory without loosing relations

the result should look like this:
[a, b, c, d] => [1, 2] [a, b, c] => [3] [a, b, c, d] => [4, 5] [a, c, d] => [6, 7]
the problem is that the keys are really short (e.g. A1, A2,...) but the values (1,2,3,..) are really big data strings..
what would be your approach to do it efficient? i know that the simple way is to call the values by reference.. but thats not what i want!

kindly perlig

$perlig =~ s/pec/cep/g if 'errors expected';

Replies are listed 'Best First'.
Re: Minimize Hash Key Value Combinations
by hdb (Monsignor) on Nov 18, 2013 at 13:34 UTC

    I do not understand how you derived the result from the data, but why are you not enumerating your strings and use the data as displayed above? (Or is this what you mean by calling the values by reference?)

      if you take a closer look to data and result you will see that the relations are the same..
      e.g.
      A->[1, 2, 3, 4, 5, 6, 7]
      is still given. enumerating CAN be the first step of the approach to get an easier comparison..
      [1] => StringA [2] => StringB ... [A] => [1, 2] [B] => [2]
      But as i want to print that to an file, StringB will be there twice. e.g. StringB is equal to a data volume of 3gb. the file will be 6gb. for that i want to print this:
      [A] => [1] [A, B] => [2]
      $perlig =~ s/pec/cep/g if 'errors expected';
        Where do StringA, StringB etc. come from? If they are in files, couldn't you store the file paths rather than their values?
        [1] => '/path/to/StringA/data' [2] => '/path/to/StringB/data'

        Here is my attempt:

        use strict; use warnings; use Data::Dumper; my @data = map { [ [split /,\s*/, $_->[0]], [split /,\s*/, $_->[1]] ] } map { /\[(.*)\]\s*=>\s*\[(.*)\]/ ? [ $1, $2 ] : () } <DATA>; my %inverted; for my $d (@data) { undef @{$inverted{$_}}{@{$d->[0]}} for @{$d->[1]}; } my %invertagain; push @{ $invertagain{join ",", sort keys %{$inverted{$_}}} }, $_ for k +eys %inverted; print Dumper \%invertagain; __DATA__ [a, b, c] => [1, 2, 3] [b, c] => [2, 3, 4, 5] [a, c, d] => [4, 5, 6, 7] [d] => [1, 2, 6]
        "if you take a closer look" to what you said:

        > the result should look like this:¹

        [a, b, c, d] => [1, 2] [b, c] => [2]

        which doesn't make sense, don't you agree?

        You are talking about a "hashtree" but the data shown has nothing to do with HoH.

        The data shown isn't even valid Perl, keys are strings never arrays.

        have a look at How (Not) To Ask A Question

        Asking a good question is often the best answer! :)

        Cheers Rolf

        ( addicted to the Perl Programming Language)

        ¹) OP updated now...

Re: Minimize Hash Key Value Combinations
by RichardK (Parson) on Nov 18, 2013 at 14:03 UTC

    you could store your attributes in there own hash (or even an array), and then the relations can refer to the shorter keys

    my %attrib = (key1 => 'xxxx', key2 => 'yyyy', ...); [a,b,c] => [key1,key2]; [d] => [keyn]; #etc

    the keys can then be as short as you like, a simple count or even a hash of the data.

      thats basically call by reference :) but thank you..
      $perlig =~ s/pec/cep/g if 'errors expected';
Re: Minimize Hash Key Value Combinations
by Random_Walk (Prior) on Nov 18, 2013 at 14:34 UTC

    If the Keys are very small, yet the values are large strings, why not put the string values in an array, then store the array indices under your keys?

    my @big_strings = ( 'War and peace', 'The Lord of the rings', 'PI, to plenty of decimal places', ); my %keys = ( # give index into big_strings [a, b, c] => [1, 2], [b, d] => [0, 2], );

    your use of something looking like lists, for 'keys' troubles me somewhat. What are you actually trying to achieve here? Is this perhaps an X-Y problem? When you walk up to your hash with a 'key' will that key be 'a' or a list a,c,g ?

    Cheers,
    R.

    Pereant, qui ante nos nostra dixerunt!
      thank you! that was my firt approach.. just use the strings as keys e.g.:
      1 => A, B, C 2 => A, B, D 3 => C
      but it leads to the same problem.. the result should be this:
      1, 2 => A, B 2 => D 1, 3 => C
      $perlig =~ s/pec/cep/g if 'errors expected';

        I don't get it... what happened to A2 and B2?

Re: Minimize Hash Key Value Combinations
by LanX (Saint) on Nov 18, 2013 at 19:14 UTC
    I think you are looking for Formal Concept Analysis

    the relations you are trying to identify are called concepts.

    Cheers Rolf

    ( addicted to the Perl Programming Language)

      Well, you are amazing! thank you! :) thats it.. the ganter algorithm :)
      $perlig =~ s/pec/cep/g if 'errors expected';
Re: Minimize Hash Key Value Combinations
by sundialsvc4 (Abbot) on Nov 18, 2013 at 14:27 UTC

    I’m not sure what your intended algorithm is.   That is to say, it is not readily apparent what set of bright-line rules lead from the input to the output shown.

    Nevertheless ... surely there must be some misconception about “references,” that you would say (at the same time) “but that’s not what I want” and “I want to do it efficient.”   References are exactly the solution to this sort of problem, where you want (large ...) values to be in several places at one time.   (Note:   this presupposes that you do not need to modify the values independently.)

    We need more detailed information about exactly what you are trying to do here, with more detail than what you have given us here, because there are a great many competing factors to consider in the choice of an appropriate data-structure ... a choice that will be a deal-breaker for this project.

      well, thank you for your answer!
      i fully agree with what you writing.. but the problem is not the internal storage ir the call by value|name|referenace issue.. it´s about detecting the relations and reduce the branches..
      [A] => [1, 2] [A] => [1]
      is the same like
      [A] => [1, 2]
      i have to mention that A is already related with 1 for that the second A => 1 relation is redundant but it has to be nonredundant :)

      $perlig =~ s/pec/cep/g if 'errors expected';