http://qs321.pair.com?node_id=684205


in reply to Re^2: Data Structures
in thread Data Structures

The only problem with using array rather than hashes, is that if, for example, all your line identifiers start with '0030nnnn', then using an array, you would have space allocated to 300,000 elements 00000000 .. 000299999 which would never be used, but would take up space. (This is what I meant above by "if your numbers are low and mostly sequential".).

In this case, you would be much better off using hashes as a "sparse array". The same is true for your station numbers. With just three stations number 1250..1252 on the line 000301038, using hashes will definitely save you much memory.

Note also that I made an error (pointed out by alexm in the post following mine) when I typed:

my( $x, $y, $z ) = $line[000301038][1261];

It should be

my( $x, $y, $z ) = @{ $line[000301038][1261] };

Or, if you go with hashes as I think you probably should having seen the real data:

my( $x, $y, $z ) = @{ $line{ 000301038 }{ 1261 } }; ## and ## Assumes use constant { X=>0, Y=1, Z=>2 } my $thisX = $line{ $line }{ $stn }[ X ];

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^4: Data Structures
by YYCseismic (Beadle) on May 02, 2008 at 16:44 UTC

    Okay, so that's one good argument in support of using hashes. What about the argument for using object oriented code, as suggested by leocharre? This would likely still use hashes, of course, but they would be hidden by the OO code.

      What about the argument for using object oriented code...

      Don't. Three reasons.

      1. Unless you are already familiar with a technique, it is a steep learning curve.
      2. As specified, you would have an object interface that allowed you to instantiate instance and the get or set the attributes.

        But you can already do that with a hash, without the extra effort.

        The reasoning is that it will allow you to change your structure later, if necessary, and so save time.

        But expending effort now to save time for an eventuality that may never happen, that will take more memory and run more slowly, is total folly.

        Expend the effort when you know you need to and when you know in what way things have to chage.

      3. Let's say you did create a class (or probably 3, Seismic::Line, Seismic::Station and Seismic::Point3D ), then you read the file and create instances to hold the data:
        while( <FILE> } { my( @attributes = unpack '...', $_; my $object = Siesmic::Line->new( @attributes ); ## Now what? ## ???

        Where are you going to put you objects so that you can retrieve them when you need them??

        ### IN A HASH OF COURSE!!!! $lines{ $object->getLineNo() }{ $object->getStnNo() } } = $object; }

        So now, you've still got the HoHs in order to find the one you want, but instead of each leaf being an small anonymous array, it's an object (or nest of objects) that requires more memory, runs more slowly, uses clumsy unfamiliar syntax, and requires at least 5 times more effort to develop.

        For what? Just in case?

      My advice is to stick with the HoHoAs. If the needs change, you can make objects later, but you'll still need the container to hold them and find the one you want.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Okay, well I think that about settles it, then. I'll stick with HoHoAs and forgo the OO structure. I don't want to needlessly take up space that can be better used elsewhere.

        Thanks for your help!

Re^4: Data Structures
by YYCseismic (Beadle) on May 02, 2008 at 19:42 UTC
    I'm not sure I understand the difference with your correction. Why is it supposed to be
    my( $x, $y, $z ) = @{ $line[000301038][1261] };
    rather than without the @{...} around the $line[...][...]?

      Without @{ .. } to derefernce the anonymous array reference, $x would be set to that array reference and $y & $z would be undef.

      $line[ 1 ][ 2 ] = [ 3,4,5 ];; my( $x, $y, $z ) = $line[ 1 ][ 2 ];; print for $x, $y, $z;; ARRAY(0x194a9f8) Use of uninitialized value in print at ... Use of uninitialized value in print at ...

      With the dereference:

      my( $x, $y, $z ) = @{ $line[ 1 ][ 2 ] };; print for $x, $y, $z;; 3 4 5

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.