http://qs321.pair.com?node_id=580422

Maze has asked for the wisdom of the Perl Monks concerning the following question:

a reccuring problem for me seems to be a way to acquire a reference to an arbritrary point in a data structure comprised of nested hashes.

suppose for example you have:

my %hash = ( 'company' => { 'microsoft' => {'applications' => {'windows' => 'operating system', 'office' => 'o +ffice suite'} ,'people' => {'bill gates' => 'chair man'} }, 'sun microsystems' => {'applications' => {'solaris' => 'operating system', 'star office' => 'office suite'} }, 'cross over' => {} } );
or something similar, now with this hash you could create a subroutine for each level of the data structure, say "sub add_company" or "sub add_application" but ideally what id like to be able to do is to seek to any level of the data structure and just throw in a value

to do that I suppose what you'd need to do is create a sub which takes a set of names to "crawl" along in the hash data structure, this would then find a reference to the part of the data structure you want, and then perform a set of operations to the 'location' referred to, independent of where that is.

with this you could then say something like:

add_to_hash(\%hash,'scalar','companies-microsoft-people-steve ballmer' +,'bills bulldog') #and you suddenly decide to have another category add_to_hash(\%hash,'list','companies-cross over-supports','iTunes','ph +oto shop') #or something more elegant
perhaps the above example could be perfomred using hash::merge, but still having an ability to find an arbitrary point in a data structure that can be then passed to a function would be immensely helpful, and if it seems like i'm trying to use hash structures like file hierachies, well yes, that's exactly right

the question i ask is, how to do it, I've tried using recursive hash references like

 \$hashref = $%hashref{foobar} 
and perl seems to really not like that, same with trying to construct the name of the hash from the root of the structure via non strict references

what am I missing?, any syntatic sugar, or a modules that I really should get to grips with

Replies are listed 'Best First'.
Re: seeking in hash data structure
by graff (Chancellor) on Oct 25, 2006 at 03:24 UTC
    If your hash structure is always HoHoHo... -- i.e. no arrays, just hashes -- then the first trick you need to nail down is a way to distinguish between terminal nodes and non-terminal nodes in your tree.

    Based on your examples, I'd guess that the easiest way to do this is by using your actual strings only as hash keys, never as values assigned to keys. The value of a hash element would always be a hash ref; non-terminal nodes would contain hash refs with one or more keys, whereas a terminal node would contain an empty hashref. The point here is that hash value is either a simple scalar or else it's a reference -- it can't be both.

    Here's a way you could take hyphen-delimited strings to create an abitrarily deep / arbitrarily wide tree structure:

    use strict; use warnings; use Data::Dumper; sub load_hash { my ( $href, $path ) = @_; my @path = split /-/, $path; my $key = shift @path; while ( @path and exists( $$href{$key} )) { $href = $$href{$key}; $key = shift @path; } if ( ! exists( $$href{$key} )) { # there's a new key in this path $$href{$key} = {}; # so create a new hashref for it load_hash( $$href{$key}, join( "-", @path )) if ( @path ); # recurse if this key/node is non-terminal } # otherwise, do nothing -- the path already exists in the structure } sub trace_paths # look for paths to a chosen node (hash key) { my ( $href, $aref, $node, $path ) = @_; $path ||= ''; my @keys = keys %$href; if ( grep /^$node$/, @keys ) { push @$aref, ( $path ) ? "$path-$node" : $node; } for my $key ( @keys ) { my $nxt_path = ( $path ) ? "$path-$key" : $key; trace_paths( $$href{$key}, $aref, $node, $nxt_path ) if ( scalar keys %{$$href{$key}} ); } } my @strings = qw/one-foo-bar-baz one-foo-zuz two-fump-zill two-fizz-foo/; my %hash; load_hash( \%hash, $_ ) for ( @strings ); print Dumper( \%hash ); my @found; trace_paths( \%hash, \@found, 'foo' ); print Dumper( \@found );
    (updated to fix an annoying initial dash in @found values)

    Now, if you really want the structure to hold scalar values as well as hash keys, you'll need to reserve one or more "special keys" that you can use for storing strings or numbers instead of lower-level hash refs. These special keys need to be strings that will never occur as data (nodes in the structure), and the load_hash and/or trace_paths subs would need an extra line or two of code to use them as needed.

Re: seeking in hash data structure
by ikegami (Patriarch) on Oct 24, 2006 at 22:55 UTC
Re: seeking in hash data structure
by driver8 (Scribe) on Oct 25, 2006 at 01:33 UTC
    It's not clear from your post what exactly you are trying to do, but I think I have some idea. Is there any reason why something like the following isn't good enough?
    $hash{'companies'}{'microsoft'}{'people'}{'steve ballmer'} = 'bills bu +lldog'; $hash{'companies'}{'cross over'}{'supports'} = ['iTunes', 'photo shop' +];
    That is, as opposed to:
    add_to_hash(\%hash,'scalar','companies-microsoft-people-steve ballmer' +,'bills bulldog'); add_to_hash(\%hash,'list','companies-cross over-supports','iTunes','ph +oto shop');
    I'm also not sure what you wanted to accomplish with \$hashref = $%hashref{foobar}, but I think it's wrong in too many ways to even list.

    -driver8

      the reason just plonking in a new value using perls syntax instead of rolling our own funtion is the same reason that you would create any function, so that it becomes adaptable. My original idea was to create something that would record a file hierachy in a hash, probably a bad idea, but in any case it's not something you can do using value declarations, not even symreferencesm, because as someone has said they're anonymous

      \$hashref = $%hashref{foobar}
      is very wrong yes, because I was getting confused with the making and dereferencing of references at the same time, I know this now after having every perl program that included the above method crashing spectacularly

Re: seeking in hash data structure
by jbert (Priest) on Oct 25, 2006 at 08:50 UTC
    You want a query language. Two come to mind:
    1. XPATH. It's used on tree structured data, pretty much as you have here. You'll need to wrap your HoHoHoH as an XML doc to use it.
    2. SQL. Used on table data, but that might suit you pretty well, you just need to model your data properly (i.e. define a 'Company' as a something which has a 'name' and possibly many 'Application's, possibly many 'People').

      Check out the DBD::RAM module if you don't want to mess about with any persistence issues, DBD::SQLite if you don't mind/actually want simple disk storage or MySQL/Postgres if you're that kind of person.

    Either way, you'll be defining your underlying types ('Company', 'Application') a little more, which I suspect is the underlying reason you keep running into this issue.

    You're make perhaps too much use of the fact that perl doesn't force types down your throat and are hence thinking of everything as a hash rather than an abstract type.

    If so, maybe just defining a few struct-type classes will give you places to put your query methods without the need to go to a generalised query language like the above.

      There is Class::XPath, which allows you to graft XPath queries onto almost any class structure. Also, there is Data::Diver, which has its own query "language" to access elements without autovivification. Using a database incurs a lot of front-up work which may or may not pay off in the long run, as for every addition to the data format, the SQL schema needs to be modified. LDAP seems more suited for storing unstructured or semistructured hierarchical data.

        Ah, I didn't know about Class::XPath. Looks very nice, and almost exactly what the doctor ordered.

        As far as I can see, Data::Diver doesn't allow the more general kind of querying which the OP requested.

        And whilst LDAP has a special place in my heart, as far as I know, it has the same schema issues which SQL does. LDAP object classes need to be defined in terms of mandatory and optional attributes, the attributes need to have an appropriate syntax. Basically LDAP object class == SQL table, LDAP attribute == SQL column. The main differences IMHO are that LDAP gives you a heirarchically-defined name for each entry and SQL gives you a richer query language.

        I would agree that DBs aren't the answer to everything, but they are an answer to "how do I do fairly arbitrary queries to my structuted data".

Re: seeking in hash data structure
by ailivac (Sexton) on Oct 25, 2006 at 05:10 UTC
    There's a fairly simple algorithm to do this that you can write if you understand how recursion and references work. Your function starts with a reference to the root of the hash, the value to add once you find the right node, and a list of keys to find that node. Shift the first keys from the list and look up that entry in the hash, which gives you a hashref. Call the function again recursively with this new hashref, the same value to add, and the remaining keys. Once there are no more keys, you can add the value to the hashref you have, which will point to the right position in the tree. Use a syntax like $hashref = $hashref->{key} to go down to the next level in the hash. By "non strict references" I assume you meant symbolic references, the ones you can't use with use strict. And no, those won't work, because the hashes are anonymous.
      That is a skillfull method my esteemed monk, but for laziness sake the perl module Data::Diver and the -numerous- SQL interfaces that i've now looked up seem to be the ticket, I think that for practices sake I may write up that algorithm my self, if I get round to it.
Re: seeking in hash data structure
by gloryhack (Deacon) on Oct 25, 2006 at 03:27 UTC
    It sounds like you might do well to look into LDAP.
Re: seeking in hash data structure
by bsb (Priest) on Jun 13, 2007 at 04:43 UTC
    use List::Util qw(reduce); my $item = reduce { $a->{$b} } $hash, @path_segments;