split to hash, problem with random keys

december has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: split to hash, problem with random keys by sauoq (Abbot) on Jul 03, 2003 at 16:23 UTC
When you define the keys, simply create an array with the keys ordered in the way you like. `my @iRecord_keys = qw( foo bar baz ); # . . . @iRecord{ @iRecord_keys } = split /\\|/, $line;` [download] -sauoq "My two cents aren't worth a dime.";	[reply] [d/l]
Re: Re: split to hash, problem with random keys by december (Pilgrim) on Jul 03, 2003 at 17:09 UTC
I've been thinking about this too - it's what I do now. Seems a bit stupid to duplicate the keys once again, but if it's the only way to ensure the correct order... Thanks!	[reply]
Re: split to hash, problem with random keys by shemp (Deacon) on Jul 03, 2003 at 16:28 UTC
Use the CPAN module Tie::IxHash. It maintains the key order for you. Keys will return in the order that they are defined. As for why does keys return in a pseudo-random order? Its faster!	[reply]
Re: Re: split to hash, problem with random keys by sauoq (Abbot) on Jul 03, 2003 at 17:30 UTC
If you must use a tied hash (which is slower and only allows you to order the keys one hash at a time) you should be aware of the fact that tied hashes don't act like normal hashes when used in boolean contexts. `my %hash tie my %tied, 'Tie::IxHash'; $hash{foo} = 1; $tied{foo} = 1; print "hash is true\n" if %hash; print "tied is true\n" if %tied; # Won't print.` [download] Until this is fixed, I think it's a good reason to avoid tied hashes when possible. As for why does keys return in a pseudo-random order? Its faster! Or, more likely, it takes less space. Such a feature would probably be implemented with a separate array that holds the keys in order (which is basically how Tie::IxHash does it.)¹ With that strategy, returning them in order wouldn't be any faster, but storing them takes more space. 1. Tie::IxHash actually keeps both the data and the keys in arrays. It then keeps a hash of indexes. But, its the storage that grows, not the time to return the keys. -sauoq "My two cents aren't worth a dime.";	[reply] [d/l]
Re: split to hash, problem with random keys by hardburn (Abbot) on Jul 03, 2003 at 17:32 UTC
It's pseudo-random because that's the way hash tables get stored. Basically, it converts the key string into an integer, which can then be used to point directly to a position in memory. The result is that the speed of a hash lookup stays the same no matter how big it gets (except in cases where different keys hash to the same value, but there are ways around that). Compare this to searching a regular array, where the best algorithms have a worst-case running time of `O(log n)`. Not bad, but not nearly as good as a hash table (which is `O(1)`, excepting collisions). When you call `keys %hash`, it is returning the data sorted by the value of the hashed key. Note that the algorithm used to do the hashing could change whenever somebody feels like it (it recently changed in Perl 5.8.0), so don't rely on the order staying the same everywhere. The NIST web site has a decent entry (external link) for a definition of a hash table, with some decent links on how they're implemented. ---- I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident. -- Schemer Note: All code is untested, unless otherwise stated	[reply] [d/l] [select]
Re: split to hash, problem with random keys by MrCromeDome (Deacon) on Jul 03, 2003 at 16:43 UTC
The pseudo-random order is because your hash information is stored in (of all things) a hash table. It's not truly pseudo-random: it's the order in which Perl's hashing routines have decided is the most optimal way to store your data. Why? Because as shemp says, "it's faster!" If you want things done in a particular order, the excellent suggestions above will be of great use to you. Cheers! MrCromeDome	[reply]
Re: Re: split to hash, problem with random keys by december (Pilgrim) on Jul 03, 2003 at 17:27 UTC
It's not by size or alphabetical. Just out of curiosity, do you know more about the logic in the order those hashing routines sort the data? Thanks for your insight.	[reply]
Re: Re: Re: split to hash, problem with random keys by demerphq (Chancellor) on Jul 03, 2003 at 19:11 UTC
do you know more about the logic in the order those hashing routines sort the data There is no ordering principle in Perls hash implementation. In fact the hashes are statically initialized with a pseudorandom sequence at launch. The order that follows from there is strictly incidental to anything else. The only thing that you are currently promised is that you will get the same sequence from the same set of keys that were inserted into the structure. (This is order dependent. Change the order in which you insert keys into the hash, and the output sequence will differ as well.) And get this: Its going to get worse. There is a DOS attack that is based around targetting vulnerabilities in the hash algorithm causing perl to slow to a crawl while populating the hash. In order to resolve this the hash will no longer be initialized with a static pseudorandom sequence, instead it will be initialized with dynamic pseudo random sequence. Thus there will be no guaranty that the same set of inserts will produce the same order of keys on two different runs. The moral of this story is that if you want your code to be forward compatible with later versions of perl then you had better not depend on native key order in any way at all. If you need to do this then sort the keys. It should come to no suprise to people that there is lots of code out there that will be subtly broken by this change. People depend on the repeatability of hash keys more often than they think they do. --- demerphq _{<Elian> And I do take a kind of perverse pleasure in having an OO assembly language...}	[reply] [d/l]
Re^4: split to hash, problem with random keys by Aristotle (Chancellor) on Jul 21, 2003 at 22:13 UTC
Re: Re^4: split to hash, problem with random keys by BrowserUk (Patriarch) on Jul 21, 2003 at 22:32 UTC
Some notes below your chosen depth have not been shown here
Re: Re: Re: split to hash, problem with random keys by MrCromeDome (Deacon) on Jul 03, 2003 at 17:32 UTC
My knowledge of data structures is spotty at best. I imagine that consulting perldoc perlguts would shed some light on this matter though. MrCromeDome	[reply]
Re: split to hash, problem with random keys by sgifford (Prior) on Jul 03, 2003 at 19:08 UTC
Maybe you don't want a hash at all: `#!/usr/bin/perl use constant FIRSTNAME => 0; use constant LASTNAME => 1; use constant ADDRESS => 2; while (<>) { chomp; my @iRecord=split(/\\|/); print "Address for $iRecord[FIRSTNAME] $iRecord[LASTNAME] is $iRecor +d[ADDRESS]\n"; }` [download] This works with an input file like First\|User\|123 Some Street. Second\|Person\|123 Some Street. Third\|Guy\|456 Other Street Fourth\|Lady\|456 Other Street You just have to set the constants to correspond to the order of the fields in the input file, and that's all that has to change if the file format changes. Perl should evaluate the constants at compile-time and arrays are faster to access than hashes, so this should be somewhat faster than your current solution too (although you'd need to benchmark to know for sure). Edit: Changed names in sample data.	[reply] [d/l]
Re: Re: split to hash, problem with random keys by december (Pilgrim) on Jul 04, 2003 at 13:37 UTC
That's a nice solution. I use the hash everywhere already, so I am a bit reluctant to switch now, but I will keep this in mind for other applications... Thanks!	[reply]
Re: split to hash, problem with random keys by Mitch (Sexton) on Jul 03, 2003 at 16:41 UTC
You can use the Tie function to have the hash return in order: `use Tie::IxHash; tie %hashname, "Tie::IxHash";` [download] Mitch	[reply] [d/l]
Re: Re: split to hash, problem with random keys by december (Pilgrim) on Jul 03, 2003 at 17:21 UTC
Thanks, I'll check if it's installed on the machines I need to run the code on.	[reply]
Re: split to hash, problem with random keys by jdklueber (Beadle) on Jul 03, 2003 at 21:06 UTC
This might work for you... It does break on a badly formatted file, though. test.txt Test1\|Test2\|Test3 Foo1\|Foo2\|Foo3 Code use strict; my @records; my @keys = qw(FirstKey SecondKey ThirdKey); sub getkey() { my $ret = shift @keys; push @keys, $ret; return $ret; } open IN, "<test.txt"; while(<IN>) { my %record; map{$record{getkey()}=$_} split(/\\|/) ; push(@records,\%record); } foreach my $rec (@records) { my %thing = %{$rec}; foreach my $key (@keys) { print "$key: $thing{$key}\n"; } } -- Jason Klueber ookami@insightbb.com /(bb)\|^b{2}/ --Shakespeare	[reply]
Re: Re: split to hash, problem with random keys by december (Pilgrim) on Jul 04, 2003 at 13:47 UTC
I've been trying something like this too, but I didn't get the map working (I tried to use a 'values %hash' loop instead of your 'getkey' function, but there I had the same problem of pseudo-random order). I have to remember the shift rotating trick, though... Breakage is not an issue, since I make (lock/edit/...) the files myself and can therefore guarantee their integrity. Or so I hope. ;)	[reply]
Re: split to hash, problem with random keys by yosefm (Friar) on Jul 03, 2003 at 16:35 UTC
What about this: `@iRecord{sort keys %iRecord} = split(/\\|/, $line);`	[reply] [d/l]
Re: Re: split to hash, problem with random keys by Mr. Muskrat (Canon) on Jul 03, 2003 at 16:47 UTC
This method assumes that the keys are to be sorted alphabetically which is probably not how they were originally ordered.	[reply]
Re: Re: Re: split to hash, problem with random keys by december (Pilgrim) on Jul 03, 2003 at 17:16 UTC
No, they're not, indeed. It's just the order I defined them in, and that is the order which is most easy to read and understand, code-wise. I could sort them, but I really want to have a certain order in my flat file, with the key in front.	[reply]


Clear questions and runnable code get the best and fastest answer
	PerlMonks