Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

order of hash

by vili (Monk)
on Nov 04, 2003 at 00:09 UTC ( [id://304286]=perlquestion: print w/replies, xml ) Need Help??

vili has asked for the wisdom of the Perl Monks concerning the following question:

Greetings,
While we know, that order in hash variables is obscure, unless sorted.
I've noticed that the same hash will always be ordered the same way, each and every time, that it's used
This leads me to thinking, that perhaps, hash order is not at all random to Perl, as it is to us.


vili
     wonders how Perl allocates memory...

Thanks for everyones reply.
Let me clear something up: I am certainly not relying on the way that Perl oreders the hash.
It struck me as curious, and I wanted to find out more about what exactly the happens
when a hash is initialized. Thank you Roger. On unpredictable vs. random, I'm sure there could be a debate.
But let me ask this: Does Perl know how a particular hash will be ordered ahead of time, given it's characteristics?
I suspect yes. Correct me if I'm wrong.

vili
     has admitted to having an 802.11 sniffing addiction

Replies are listed 'Best First'.
Re: order of hash
by Abigail-II (Bishop) on Nov 04, 2003 at 00:40 UTC
    I've noticed that the same hash will always be ordered the same way, each and every time, that it's used
    Not anymore. From the perldelta that comes with 5.8.1:
    Hash Randomisation Mainly due to security reasons, the "random ordering" of hashes has been made even more random. Previously while the order of hash elements from keys(), values(), and each() was essentially random, it was still repeatable. Now, however, the order varies between different runs of Perl. Perl has never guaranteed any ordering of the hash keys, and the ordering has already changed several times during the lifetime of Perl 5. Also, the ordering of hash keys has always been, and continues to be, affected by the insertion order. The added randomness may affect applications. One possible scenario is when output of an application has included hash data. For example, if you have used the Data::Dumper module to dump data into different files, and then compared the files to see whether the data has changed, now you will have false positives since the order in which hashes are dumped will vary. In general the cure is to sort the keys (or the values); in particular for Data::Dumper to use the "Sortkeys" option. If some par­ ticular order is really important, use tied hashes: for example the Tie::IxHash module which by default preserves the order in which the hash elements were added. More subtle problem is reliance on the order of "global destruction". That is what happens at the end of execu­ tion: Perl destroys all data structures, including user data. If your destructors (the DESTROY subroutines) have assumed any particular ordering to the global destruction, there might be problems ahead. For example, in a destruc­ tor of one object you cannot assume that objects of any other class are still available, unless you hold a refer­ ence to them. If the environment variable PERL_DESTRUCT_LEVEL is set to a non-zero value, or if Perl is exiting a spawned thread, it will also destruct the ordinary references and the symbol tables that are no longer in use. You can't call a class method or an ordi­ nary function on a class that has been collected that way. The hash randomisation is certain to reveal hidden assump­ tions about some particular ordering of hash elements, and outright bugs: it revealed a few bugs in the Perl core and core modules. To disable the hash randomisation in runtime, set the environment variable PERL_HASH_SEED to 0 (zero) before running Perl (for more information see "PERL_HASH_SEED" in perlrun), or to disable the feature completely in compile time, compile with "-DNO_HASH_SEED" (see INSTALL). See "Algorithmic Complexity Attacks" in perlsec for the original rationale behind this change.

    Abigail

•Re: order of hash
by merlyn (Sage) on Nov 04, 2003 at 00:12 UTC
Re: order of hash
by tilly (Archbishop) on Nov 04, 2003 at 01:44 UTC
    I see that nobody has explained for you yet what the reason is behind the semi-random order. See Re (tilly) 4: Flip Flop III - Musical Buckets for a brief and hopefully understandable explanation.

    After you have read that, a key detail that I glossed over there. A good hashing algorithm takes a lot of normal strings and should scatter them well among buckets, hence it tries to make things look pretty random. (While really being deterministic.)

    What everyone else is referring to is that the hashing function that Perl uses is, as of 5.8.1, going to be different every time you start Perl. Previously it only varied from one version of Perl to the next.

Re: order of hash
by Roger (Parson) on Nov 04, 2003 at 05:24 UTC
    Perhaps before answering this question, you should first understand how does a hashing algorithm work.

    The simplest form of a hash has a hashing function and an array of linked-list of predetermined size. The hashing function is not random at all. It takes in the parameters such as storage array size, search key, etc, and generates an index pointing to a fixed location in the array. The output of the hashing function is repeated for the same input parameters. In a single level hash, the input data is then inserted into the linked-list at the given location in the storage array.

    Why is the hash-function not random? The reason is simple: if the computed storage array index returned by the hash-function is random for a given search key, you will never get back the data you stored in the first place.

    Given the nature of the hash key, the hash function is carefully selected so that it minimises collisions in hash keys, ie., for different input parameters, minimise the number of times the same storage array index is selected. Thus giving a more balanced hash.

    The benefit of using a hash is to maximize the speed of storage and retrieval of data based on a search key.

    There are lots of research papers on hash and hashing functions. Google on hashing algorithm, and you will get hundreds of hits.

    Knowing that a hash has a non-random hashing function, I think it is clear that Perl hash is not random, and the hash built for the same input data is always in the same order.

Re: order of hash
by jdtoronto (Prior) on Nov 04, 2003 at 01:53 UTC
    I can confirm the improved randomness of the hashes in 5.8.1, I have an application I am debugging where I use Data::Dumper to show some hashes so I can check them. In 5.6.1 the same hash always had the same order. (I load it from a config file). But now in 5.8.1 I rarely see the same ordering. So as things improve so it becomes even more foolish to rely on any sort of ordering.

    Which says to me, if you want an order, establish it for yourself.

    As for how Perl allocates memory, VERY WELL! And it doesn't have too many leaks either.

    jdtoronto

Re: order of hash
by batkins (Chaplain) on Nov 04, 2003 at 00:24 UTC
    Ummmm......

    What merlyn said is right. Perl doesn't order hash keys in order of creation, but that doesn't mean they're ordered randomly....

    Are you sure it was a book? Are you sure it wasn't.....nothing?
Re: order of hash
by Elgon (Curate) on Nov 04, 2003 at 10:25 UTC

    vili,

    See the above replies for more understanding of the technical issues than I can give, however I would point out that the practical upshot of all of this is the guarantee provided: Writing code which relies on the assumption that this order stays constant may work in your particular implementation of Perl, however this is a bad idea on the basis that it may break with a future revision of the language: There is no guarantee that this behaviour will not change.

    Elgon

    Please, if this node offends you, re-read it. Think for a bit. I am almost certainly not trying to offend you. Remember - Please never take anything I do or say seriously.

Re: order of hash -- entropy requirements?
by mattr (Curate) on Nov 04, 2003 at 12:22 UTC
    If security was a reason for increasing the randomness of hashes then the question begs to be asked, what are the entropy requirements of hash operations?

    I have found a lack of available entropy in the system to be a major bottleneck in output of running programs at times. Obviously if randomness is being supplied by /dev/random then might not a race condition be possible in the case that you are storing random numbers in a hash while the insertion or bucket redistribution operations themselves require random numbers to operate? Is /dev/random actually used in hash mechanics now?

    I am not trying to be facetious. Some operating systems really will have a problem with operations which require a lot of random numbers. I saw a visible problem with console display of random numbers. Try this (just tried on RedHat 9 in an xterm):

    cat /dev/random | hexdump

    After a little bit (1000 lines maybe) the output will stop and only restart when you move the mouse or hit a key. Kill it with ctrl-c and subsequent runs will only show one or two lines before waiting for input again. After a little bit it won't even output a single line until you let it sit by itself for a minute (maybe due to kernel's tcp randomization code?? anyway.) If /dev/random is not being used then the bucket order is just being hashed again, which is more obfuscation than real security I'd think. I tried to mix hash operations with reading from the above line (followed by a pipe) however it looks like it will continue to hang until I let it rest for a minute or so.

    perl -e 'open (IN,"cat /dev/random|hexdump|"); while (<IN>) {$h{$_}=$_; print $c++ . " ";} close (<IN>);'

    Considering I was still able to do hash operations after horribly exhausting my system's entropy I would say the security gained is fragile as it is based on a hash function and not an entropy source, therefore the order is not random at all, but calculable based on the hash to be reordered.

    So is the "increased randomness" of hash order in newer perls mainly just meant as a reminder not to depend on the order of the hash, or is it something real which could be made to draw from an authentic entropy source if you really wanted to? Another use for /dev/cdrom ..

      Exhausting entropy like that is a serious problem for some applications. However, I doubt it's a big deal for Perl.

      The problem the increased hash randomness was trying to solve was that certain well-crafted inputs would expand the internal data structures, thus consuming a lot more resources and becoming a potential DoS attack (the attack can be generalized to be used on many languages and algorithms, not just Perl's hashes). By putting a little randomization in the hash, it becomes much harder for an attacker to predict how the datastructure will expand.

      The random number generator used need not be from an extremely high quality source. Just enough that an attacker won't be able to predict the hash seeds. It would be pretty easy to foil a remote attacker this way, but a bit harder for a local attacker, depending on their system priviliges.

      ----
      I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
      -- Schemer

      : () { :|:& };:

      Note: All code is untested, unless otherwise stated

        Thank you very much for your illumination! I had no entropy problems just now creating 100,000 small hashes so I guess the seed is being chosen without using /dev/random at all. Thanks again. --Matt
Re: order of hash
by melora (Scribe) on Nov 04, 2003 at 12:53 UTC
    Some want to delve and understand (for many good reasons) the internals of how Perl implements the hash. As users (for me, too many interests besides the above), some want to know whether the ordering of the hash is consistent. As a user, I would say that I can imagine that the hash order would be the same for the same data and all other things being equal, but I would never try to depend on that ordering. After all, I'm looking up by key, not in order.. a nice foreach works well (I can always sort the keys if I need to).

      As many other posters pointed out, as of perl 5.8.1, the hash order will no longer be the same between runs. Before that, you could assume that the order would be the same on the same version of perl and get away with it.

      So if you're depending on keys %hash to have consistant output, you better have a sort in front of it.

      ----
      I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
      -- Schemer

      : () { :|:& };:

      Note: All code is untested, unless otherwise stated

Re: order of hash
by TomDLux (Vicar) on Nov 04, 2003 at 15:57 UTC

    In all programming languages, characteristics such as hash key ordering is described as non-deterministic, sometimes loosely referred to as random. This means you cannot rely on any ordering, should not expect an ordering to repeat if you go to a different platform or different usage environment, but does not guarantee randomness. Now Perl has begun to guarantee ( at least some degree of ) randomness in hash ordering ( but I still wouldn't base a RNG on it ).

    --
    TTTATCGGTCGTTATATAGATGTTTGCA

Re: order of hash
by Roger (Parson) on Nov 05, 2003 at 00:04 UTC
    Does Perl know how a particular hash will be ordered ahead of time, given it's characteristics?

    I think the answer is yes. I even suspect that Perl uses the same hashing function/algorithm regardless the type/nature of the search key.

    I have constructed the following code to test my hypothesis -
    my %hashA; $hashA{123456} = 1; $hashA{'Roger and Albert'} = 1; $hashA{'Roger'} = 1; $hashA{'456789'} = 1; print "HASH A\n------\n"; print "$_\n" for keys %hashA; my %hashB; $hashB{'Roger'} = 1; $hashB{123456} = 1; $hashB{'Roger and Albert'} = 1; $hashB{456789} = 1; print "\nHASH B\n------\n"; print "$_\n" for keys %hashB;
    And the order of both hashA and hashB are identical -
    HASH A ------ 456789 123456 Roger and Albert Roger HASH B ------ 456789 123456 Roger and Albert Roger
    In the first case, a number is used as a search key to start the hash, while in the second case, a string is used as a search key to start the hash. If the perl hash engine differentiate numeric search keys and string search keys, then I would get two different hash orders for the hashes. But that's not the case.

    The conclusion is that Perl treats all the search keys as strings, and it has a single hashing function optimized for string search keys.

    However Perl might implement different hashing functions between different versions. Perl 5.8.1 might have introduced a different random seed/number every session to initialize its hashing engine/function, perhaps to reduce the bias of the hashing function and improve the overall performance of the hash.

      The random seed for the hash function is to avoid DoS attacks based on "well-crafted" data. Reducing the bias of the hash function (if there was any) or improving the performance would require a different hash algorithm.

      Carelessly chosen alogorithms might be less secure against DoS attacks, be more biased, or degrade performance.

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

Re: order of hash
by Freddo (Acolyte) on Nov 06, 2003 at 17:52 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://304286]
Approved by graff
Front-paged by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2024-04-25 22:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found