Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re^2: Data Structures

by YYCseismic (Beadle)
on May 02, 2008 at 15:55 UTC ( [id://684197]=note: print w/replies, xml ) Need Help??


in reply to Re: Data Structures
in thread Data Structures

As I'm rather new to this kind of thing, I didn't realize that parallel data structures might not be a good idea. What you say makes sense, though.

I think I'm more inclined to go with your second option:

my @line = ( [ ## $line[ 0 ] [ xxx.xxx, yyy.yyy, zzz.zzz ], ## $line[0][0] (station 0) [ xxx.xxx, yyy.yyy, zzz.zzz ], ## (station 1) [ xxx.xxx, yyy.yyy, zzz.zzz ], ... ], [ ## Line[ 1 ] [ xxx.xxx, yyy.yyy, zzz.zzz ], ## Line[ 1 ][ 0 ] [ xxx.xxx, yyy.yyy, zzz.zzz ], [ xxx.xxx, yyy.yyy, zzz.zzz ], ... ], ... ); my( $x, $y, $z ) = $line[ $lineNo ][ $stationNo ];
My plan, if you can call it that, was to hold the line names (identifiers) in an array, since they may or may not start with a numeric. An annotated example of a portion of a SEG-P1 file is given below.

lineName stn. east...north...elev. 000301038 1260 52205121N109153806W 618485158009020 6626 000301038 1261 52205121N109153674W 618510158009027 6623 000301038 1262 52205120N109153542W 618535158009029 6621 000301016 400 52153542N109482654W 581401057903909 6738 000301016 401 52153542N109482522W 581426057903913 6738 000301016 402 52153542N109482390W 581451057903918 6738

The stn, east, north, and elev indicate the columns in which those values (station number, easting, northing, and elevation) are found throughout the file. (Note the change in line number/identifier part way through.) Station identifiers are always numeric, and, while they are arbitrarily assigned, I would like for any access to be according to these numbers. For example,

my( $x, $y, $z ) = $line[000301038][1261]; ... print "$x, $y, $z"; # Result: 6185101, 58009027, 6623
One reason I thought about using hashes here is because they are essentially associative arrays, so I can have a hash with the line name as the key, instead of some arbitrary number as the key. So instead of accessing according to $line[0][0] for the first station of the first line, I would prefer to say something like $line{32A-5}[101] for the first station of line 32A-5. This way there is no "first" or "last" line, only first and last stations (which makes sense, seeing as the survey is generally linear).

Replies are listed 'Best First'.
Re^3: Data Structures
by BrowserUk (Patriarch) on May 02, 2008 at 16:26 UTC

    The only problem with using array rather than hashes, is that if, for example, all your line identifiers start with '0030nnnn', then using an array, you would have space allocated to 300,000 elements 00000000 .. 000299999 which would never be used, but would take up space. (This is what I meant above by "if your numbers are low and mostly sequential".).

    In this case, you would be much better off using hashes as a "sparse array". The same is true for your station numbers. With just three stations number 1250..1252 on the line 000301038, using hashes will definitely save you much memory.

    Note also that I made an error (pointed out by alexm in the post following mine) when I typed:

    my( $x, $y, $z ) = $line[000301038][1261];

    It should be

    my( $x, $y, $z ) = @{ $line[000301038][1261] };

    Or, if you go with hashes as I think you probably should having seen the real data:

    my( $x, $y, $z ) = @{ $line{ 000301038 }{ 1261 } }; ## and ## Assumes use constant { X=>0, Y=1, Z=>2 } my $thisX = $line{ $line }{ $stn }[ X ];

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      Okay, so that's one good argument in support of using hashes. What about the argument for using object oriented code, as suggested by leocharre? This would likely still use hashes, of course, but they would be hidden by the OO code.

        What about the argument for using object oriented code...

        Don't. Three reasons.

        1. Unless you are already familiar with a technique, it is a steep learning curve.
        2. As specified, you would have an object interface that allowed you to instantiate instance and the get or set the attributes.

          But you can already do that with a hash, without the extra effort.

          The reasoning is that it will allow you to change your structure later, if necessary, and so save time.

          But expending effort now to save time for an eventuality that may never happen, that will take more memory and run more slowly, is total folly.

          Expend the effort when you know you need to and when you know in what way things have to chage.

        3. Let's say you did create a class (or probably 3, Seismic::Line, Seismic::Station and Seismic::Point3D ), then you read the file and create instances to hold the data:
          while( <FILE> } { my( @attributes = unpack '...', $_; my $object = Siesmic::Line->new( @attributes ); ## Now what? ## ???

          Where are you going to put you objects so that you can retrieve them when you need them??

          ### IN A HASH OF COURSE!!!! $lines{ $object->getLineNo() }{ $object->getStnNo() } } = $object; }

          So now, you've still got the HoHs in order to find the one you want, but instead of each leaf being an small anonymous array, it's an object (or nest of objects) that requires more memory, runs more slowly, uses clumsy unfamiliar syntax, and requires at least 5 times more effort to develop.

          For what? Just in case?

        My advice is to stick with the HoHoAs. If the needs change, you can make objects later, but you'll still need the container to hold them and find the one you want.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
      I'm not sure I understand the difference with your correction. Why is it supposed to be
      my( $x, $y, $z ) = @{ $line[000301038][1261] };
      rather than without the @{...} around the $line[...][...]?

        Without @{ .. } to derefernce the anonymous array reference, $x would be set to that array reference and $y & $z would be undef.

        $line[ 1 ][ 2 ] = [ 3,4,5 ];; my( $x, $y, $z ) = $line[ 1 ][ 2 ];; print for $x, $y, $z;; ARRAY(0x194a9f8) Use of uninitialized value in print at ... Use of uninitialized value in print at ...

        With the dereference:

        my( $x, $y, $z ) = @{ $line[ 1 ][ 2 ] };; print for $x, $y, $z;; 3 4 5

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re^3: Data Structures
by Anonymous Monk on May 04, 2008 at 17:41 UTC
    Given your example, what you want is a hash of arrays of hashes:
    my %data; push @{$data{$linename}}, { station => $station, coords => $coords, ea +sting => $easting, northing => $northing, elevation => $elevation };
    for my $linename ( keys %data ){ for my $entry ( @{$data{$linename}} ){ print "$linename: @{$entry}{qw(station easting northing elevation) +}; } }
    At least, that's what it looks like from your example, but you really haven't given enough information about the structure of those fields.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://684197]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (5)
As of 2024-03-28 08:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found