Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^3: Data Structures

by CountZero (Bishop)
on May 02, 2008 at 20:39 UTC ( #684252=note: print w/replies, xml ) Need Help??


in reply to Re^2: Data Structures
in thread Data Structures

Where are you going to store your 1000 objects?
In an object?

And how are you going to access the individual instances you need?
Isn't that what accessors are made for?

Of course in Perl, objects are just eye-candy around a data-structure (hashes, arrays, or any combination) but if I follow your reasoning, I should program in Assembler as in the end it all is machine-code.

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Replies are listed 'Best First'.
Re^4: Data Structures
by BrowserUk (Pope) on May 02, 2008 at 21:26 UTC

    Care to share some code to demonstrate? Home brew hash or Moose-based doesn't matter.

    Say, 1000 lines and with an average of say 10 stations per line.

    This will generate some test data sufficient for the exercise:

    #! perl -slw use strict; =comment lineName stn. east...north...elev. 000301038 1260 52205121N109153806W 618485158009020 6626 000301038 1261 52205121N109153674W 618510158009027 6623 =cut for my $line ( 1 .. 1000 ) { for my $stn ( 1 .. 10 ) { printf "%08d %8d %08dN%08dW %015s %4d\n", $line, $stn, int( 1e6 * rand( 90 ) ), int( 1e6 * rand( 90 ) ), int( 1e13 * rand( 90 ) ), int( rand 9999 ); } }

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      OK

      I use Moose

      First I made the class holding the whole structure. I called it Seismic.

      And this is the program:

      Everything is stored in one $grid object which consists of $line objects which hold the $station objects.

      Direct access to a station and its attributes is easy:

      $grid->get_line(257)->get_station('257_5')->Elevation;
      Of course, behind the scenes, it is all hashes and arrays and hashes of hashes and ... . And of course it is much slower than programming these data structures directly, but then you don't get these nice accessors, mutators, type-checking, default values, ...

      PS: I did not implement all the fields in all the objects, just enough --I hope-- to show it works.

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        Of course, behind the scenes, it is all hashes and arrays and hashes of hashes and ... .

        And in one sentence, you've summed up the crux of my object(sic)ion to what you are proposing.

        You're (simply and unnecessarily) wrapping a data structure in an (nest of) objects.

        There are no methods, beyond getters and setters, which hashes and arrays already know how to do. In the process, you've hidden all the other things they know how to do. Like:

        Some of these can be exposed through the addition of generated methods, but all of them? And at what cost? Different names to the ones every half-experienced Perl programmer is already familiar with. get_size() and put_size() instead of $#array as an lvalue.

        Or will that be get_length() and put_length() for this module? Or maybe get_count() and put_count(). Or ...

        Throwing away the familiar, in order to substitute unfamiliar (and variable) naming conventions to do exactly similar things means that instead of the maintenance programmer being able to use the knowledge he has, he has to run away to the documentation--and even the source code--in order to understand the code he is reading.

        I have no idea which of the 20 or so Moose modules I would need to look at in order to find out what the naming convention of the array length attribute of array based collections is. If it is exposed as standard and documented at all. Besides which, if I'm reading things correctly, the Moose user (Moose-based class writer) has the option of changing those names anyway. So that means I have to look it up for every class I use. And every method for every class. The costs of just the documentation lookup time in maintenance is amazing.

        And of course it is much slower than programming these data structures directly,

        And more memory expensive also. In my tests, an order of magnitude slower and requires 50 times more memory. But for the OPs current needs of 1000 lines, that is quite possibly insignificant, so let's not dwell on that. Let's look at what you consider this wrapping of native functionality buys you:

        but then you don't get these nice accessors, mutators, type-checking, default values, ...

        Hm. Let's compare syntax. You suggest that this is "nice":

        $grid->get_line(257)->get_station('257_5')->Elevation;

        The equivalent using native structures:

        $grid->{ 257 }{ 257_5 }[ Elevation ];

        And that's better because? Anticipating your answer to be along the lines of: "Because 'line' and 'station' are explicitly mentioned.", I'll counter with the fact that most accesses will not be in terms of constants, but rather variables for the indexing. So then you have to compare these two versions:

        $grid->get_line($lineID)->get_station($stationID)->Elevation;

        And

        $grid->{ $lineId }{ $stationID }[ Elevation ];

        Don't you find the verbosity and repetition of 'get_line'/$lineID & 'get_station'/$stationID distracting? Pointless? It doesn't look too bad with a single access expression as above, but what about when you come to do some real work with these things?

        Syntax in use

        In the OPs data structure there is another field between Northing and Elevation (which I called 'Other' because I have no idea what it is. See later) but just for the sake of example lets assume that it is wholly or partially derived from the 3D point (Easting, Northing and Elevation). And due to say, continental drift or more accurate GPS or some such, it is necessary to recalculate these values.

        The Easting have to be updated by 0.001% West. The Northings by 0.0002% South. The elevations By +1 unit. And the Other field recalculated according to some formula.

        The code using native data structures:

        my $seismic = Seismic->new( 'seismic.dat' ); for my $lineID ( keys %{ $seismic } ) { for my $stn ( values %{ $seismic->{ $lineID } } ) { $stn->[ Easting ] -= $stn->[ Easting ] * 0.00001; $stn->[ Northing ] -= $stn->[ Northing ] * 0.000002; $stn->[ Elevation ] += 1; $stn->[ Other ] = int( ( $stn->[ Easting ] * $stn->[ Northing ] * $stn->[ Elevation ] ) / 3.0 ); } }

        Using your objects

        ## You didn't provide a constructor from a file ## But you could have. my $grid = Seismic::Grid->new( 'seismic.dat' ); for my $line_id ( $grid->line_ids ) { my $line = $grid->get_line( $line_id ); for my $stn_id ( $line->station_ids } ) { my $station = $line->get_station($station_id); $station->set_Easting( $station->get_Easting - ( $station->get_Easting * 0.00001 ) ); $station->set_Northing( $station->get_Northing - ( $station->get_Northing * 0.000002 ) ); $station->set_Elevation( $station->get_Elevation() - 1 ); ## You didn't provide this attribute, presumably ## cos like me you didn't know what is was ## but you could have $station->set_Other( int( ( $station->get_Easting() * $station->get_Northing() * $station->get_Elevation() ) / 3.0 ); } }

        So, what did what did OO buy you other than verbosity and complexity?

        And before you answer, every time you go to start your reply with 'if', remember that any code written to cater for possibilities or eventualities that aren't in evidence from the OPs stated requirements, as well as being a potential solution to a potential problem, is also effort expended (money) that may never be used.

        But it will still have to be tested and maintained. And when the future requirements of the application are in evidence, it may complicate the actual code needed to satisfy those actual requirements. Or worse, have to be thrown away completely because it is totally incompatible with them.

        That's wasted development time, and testing time, and documentation effort, and interim maintenance effort on the basis of guesses about "what the future may hold".

        For comparison

        , here's my equivalent of your Seismic::Grid package posted above. Ie. The package named Seismic in my examples in this post.
        package Seismic; use Exporter; use constant { Easting => 0, Northing => 1, Other => 2, Elevation => 3 }; our @ISA = qw[ Exporter ]; our @EXPORT = qw[ Easting Northing Other Elevation ]; sub new { my( $class, $filename ) = @_; my %lines; open my $in, '<', $filename or die "$filename : $!"; while( <$in> ) { my( $line, $stn, $x, $y, $other, $z ) = unpack 'A8xA8xA8xA8xxA15xa4', $_; $lines{ $line }{ $stn } = [ $x,$y, $other, $z ]; } close $in; return bless \%lines, $class; } 1;

        Note that this implements the only 'method' actually required by the OPs description--a from-file constructor. Notice how much less code there is than yours above whilst remembering that since the dawn of time (Okay, the software industry), there has been a direct (extra-linear) relationship between lines of code written and bugs found/maintenance required.

        And note also, that this is just the number of lines you wrote. It doesn't include the whole of Moose::* and its dependencies: Class::MOP, ( and its dependencies: Sub::Name, MRO::Compat (and its: Class::C3 (and its: Algorithm::C3 ) ), Sub::Exporter (and its: Params::Util, Sub::Install, Data::Optlist ) ).

        And I've left out all those that can generally be expected to be a part of the standard distribution (despite that it requires the latest cpan versions of most of them). Including Filter::Simple, which means source filters! (Though I had no luck in working out where in the pile of modules this is actually used?)

        Now the typical reaction to this is "So what"...its all code I didn't have to write and don't have to maintain myself". But when something goes wrong you'll wish you could maintain it yourself.

        Because when it turns out that the ref-count manipulations in lines 458 & 459 of Class::CS::XS are causing a memory leak, which manifests itself as coming from an anonymous sub generated by the string eval in Class::MOP::Method::Accessor, and you're urgently trying to get your application back on-line, and the authors of the dozen or so packages involved are arguing about who needs to change what.

        At that point, you'll wonder about the efficacy of replacing Perl's reliable built-ins for such complexity, all for the sake a little syntactic sugar. Especially as it means you have to write more code to start with, the complexity of the code you write is increased, and the resultant code is if anything less readable than standard Perl.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://684252]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (2)
As of 2021-04-22 03:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?