http://qs321.pair.com?node_id=684264


in reply to Re^3: Data Structures
in thread Data Structures

Care to share some code to demonstrate? Home brew hash or Moose-based doesn't matter.

Say, 1000 lines and with an average of say 10 stations per line.

This will generate some test data sufficient for the exercise:

#! perl -slw use strict; =comment lineName stn. east...north...elev. 000301038 1260 52205121N109153806W 618485158009020 6626 000301038 1261 52205121N109153674W 618510158009027 6623 =cut for my $line ( 1 .. 1000 ) { for my $stn ( 1 .. 10 ) { printf "%08d %8d %08dN%08dW %015s %4d\n", $line, $stn, int( 1e6 * rand( 90 ) ), int( 1e6 * rand( 90 ) ), int( 1e13 * rand( 90 ) ), int( rand 9999 ); } }

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^5: Data Structures
by CountZero (Bishop) on May 03, 2008 at 19:34 UTC
    OK

    I use Moose

    First I made the class holding the whole structure. I called it Seismic.

    And this is the program:

    Everything is stored in one $grid object which consists of $line objects which hold the $station objects.

    Direct access to a station and its attributes is easy:

    $grid->get_line(257)->get_station('257_5')->Elevation;
    Of course, behind the scenes, it is all hashes and arrays and hashes of hashes and ... . And of course it is much slower than programming these data structures directly, but then you don't get these nice accessors, mutators, type-checking, default values, ...

    PS: I did not implement all the fields in all the objects, just enough --I hope-- to show it works.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      Of course, behind the scenes, it is all hashes and arrays and hashes of hashes and ... .

      And in one sentence, you've summed up the crux of my object(sic)ion to what you are proposing.

      You're (simply and unnecessarily) wrapping a data structure in an (nest of) objects.

      There are no methods, beyond getters and setters, which hashes and arrays already know how to do. In the process, you've hidden all the other things they know how to do. Like:

      Some of these can be exposed through the addition of generated methods, but all of them? And at what cost? Different names to the ones every half-experienced Perl programmer is already familiar with. get_size() and put_size() instead of $#array as an lvalue.

      Or will that be get_length() and put_length() for this module? Or maybe get_count() and put_count(). Or ...

      Throwing away the familiar, in order to substitute unfamiliar (and variable) naming conventions to do exactly similar things means that instead of the maintenance programmer being able to use the knowledge he has, he has to run away to the documentation--and even the source code--in order to understand the code he is reading.

      I have no idea which of the 20 or so Moose modules I would need to look at in order to find out what the naming convention of the array length attribute of array based collections is. If it is exposed as standard and documented at all. Besides which, if I'm reading things correctly, the Moose user (Moose-based class writer) has the option of changing those names anyway. So that means I have to look it up for every class I use. And every method for every class. The costs of just the documentation lookup time in maintenance is amazing.

      And of course it is much slower than programming these data structures directly,

      And more memory expensive also. In my tests, an order of magnitude slower and requires 50 times more memory. But for the OPs current needs of 1000 lines, that is quite possibly insignificant, so let's not dwell on that. Let's look at what you consider this wrapping of native functionality buys you:

      but then you don't get these nice accessors, mutators, type-checking, default values, ...

      Hm. Let's compare syntax. You suggest that this is "nice":

      $grid->get_line(257)->get_station('257_5')->Elevation;

      The equivalent using native structures:

      $grid->{ 257 }{ 257_5 }[ Elevation ];

      And that's better because? Anticipating your answer to be along the lines of: "Because 'line' and 'station' are explicitly mentioned.", I'll counter with the fact that most accesses will not be in terms of constants, but rather variables for the indexing. So then you have to compare these two versions:

      $grid->get_line($lineID)->get_station($stationID)->Elevation;

      And

      $grid->{ $lineId }{ $stationID }[ Elevation ];

      Don't you find the verbosity and repetition of 'get_line'/$lineID & 'get_station'/$stationID distracting? Pointless? It doesn't look too bad with a single access expression as above, but what about when you come to do some real work with these things?

      Syntax in use

      In the OPs data structure there is another field between Northing and Elevation (which I called 'Other' because I have no idea what it is. See later) but just for the sake of example lets assume that it is wholly or partially derived from the 3D point (Easting, Northing and Elevation). And due to say, continental drift or more accurate GPS or some such, it is necessary to recalculate these values.

      The Easting have to be updated by 0.001% West. The Northings by 0.0002% South. The elevations By +1 unit. And the Other field recalculated according to some formula.

      The code using native data structures:

      my $seismic = Seismic->new( 'seismic.dat' ); for my $lineID ( keys %{ $seismic } ) { for my $stn ( values %{ $seismic->{ $lineID } } ) { $stn->[ Easting ] -= $stn->[ Easting ] * 0.00001; $stn->[ Northing ] -= $stn->[ Northing ] * 0.000002; $stn->[ Elevation ] += 1; $stn->[ Other ] = int( ( $stn->[ Easting ] * $stn->[ Northing ] * $stn->[ Elevation ] ) / 3.0 ); } }

      Using your objects

      ## You didn't provide a constructor from a file ## But you could have. my $grid = Seismic::Grid->new( 'seismic.dat' ); for my $line_id ( $grid->line_ids ) { my $line = $grid->get_line( $line_id ); for my $stn_id ( $line->station_ids } ) { my $station = $line->get_station($station_id); $station->set_Easting( $station->get_Easting - ( $station->get_Easting * 0.00001 ) ); $station->set_Northing( $station->get_Northing - ( $station->get_Northing * 0.000002 ) ); $station->set_Elevation( $station->get_Elevation() - 1 ); ## You didn't provide this attribute, presumably ## cos like me you didn't know what is was ## but you could have $station->set_Other( int( ( $station->get_Easting() * $station->get_Northing() * $station->get_Elevation() ) / 3.0 ); } }

      So, what did what did OO buy you other than verbosity and complexity?

      And before you answer, every time you go to start your reply with 'if', remember that any code written to cater for possibilities or eventualities that aren't in evidence from the OPs stated requirements, as well as being a potential solution to a potential problem, is also effort expended (money) that may never be used.

      But it will still have to be tested and maintained. And when the future requirements of the application are in evidence, it may complicate the actual code needed to satisfy those actual requirements. Or worse, have to be thrown away completely because it is totally incompatible with them.

      That's wasted development time, and testing time, and documentation effort, and interim maintenance effort on the basis of guesses about "what the future may hold".

      For comparison

      , here's my equivalent of your Seismic::Grid package posted above. Ie. The package named Seismic in my examples in this post.
      package Seismic; use Exporter; use constant { Easting => 0, Northing => 1, Other => 2, Elevation => 3 }; our @ISA = qw[ Exporter ]; our @EXPORT = qw[ Easting Northing Other Elevation ]; sub new { my( $class, $filename ) = @_; my %lines; open my $in, '<', $filename or die "$filename : $!"; while( <$in> ) { my( $line, $stn, $x, $y, $other, $z ) = unpack 'A8xA8xA8xA8xxA15xa4', $_; $lines{ $line }{ $stn } = [ $x,$y, $other, $z ]; } close $in; return bless \%lines, $class; } 1;

      Note that this implements the only 'method' actually required by the OPs description--a from-file constructor. Notice how much less code there is than yours above whilst remembering that since the dawn of time (Okay, the software industry), there has been a direct (extra-linear) relationship between lines of code written and bugs found/maintenance required.

      And note also, that this is just the number of lines you wrote. It doesn't include the whole of Moose::* and its dependencies: Class::MOP, ( and its dependencies: Sub::Name, MRO::Compat (and its: Class::C3 (and its: Algorithm::C3 ) ), Sub::Exporter (and its: Params::Util, Sub::Install, Data::Optlist ) ).

      And I've left out all those that can generally be expected to be a part of the standard distribution (despite that it requires the latest cpan versions of most of them). Including Filter::Simple, which means source filters! (Though I had no luck in working out where in the pile of modules this is actually used?)

      Now the typical reaction to this is "So what"...its all code I didn't have to write and don't have to maintain myself". But when something goes wrong you'll wish you could maintain it yourself.

      Because when it turns out that the ref-count manipulations in lines 458 & 459 of Class::CS::XS are causing a memory leak, which manifests itself as coming from an anonymous sub generated by the string eval in Class::MOP::Method::Accessor, and you're urgently trying to get your application back on-line, and the authors of the dozen or so packages involved are arguing about who needs to change what.

      At that point, you'll wonder about the efficacy of replacing Perl's reliable built-ins for such complexity, all for the sake a little syntactic sugar. Especially as it means you have to write more code to start with, the complexity of the code you write is increased, and the resultant code is if anything less readable than standard Perl.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        ++ for the elaborate argumentation of your point. Having been a lawyer for the most part of my life, I know good arguments when I see them.

        Still, it does not convince me.

        If I should follow your reasoning, I should forego the use of modules such as DBI, CGI, List::Utils, ... for the simple reason that one looses control by using other people's modules, especially when they are fairly complex and might hide dragons behind their nice APIs, and because it is faster and more memory efficient to program directly to the bare metal.

        I daresay, I'm amazed you still use Perl at all! Who knows what bugs may lurk in the language and its implementation, or for the same token, in the OS or in the processor's microcode? I think you will still remember the infamous Pentium processor floating point bug. No, no, A sliderule and paper and pencil are the only things you can really trust (if you do all the work yourself of course).

        Sure your approach uses less memory, but I don't get an out-of-memory error and yes it is faster, but I don't mind if it runs 20 seconds or even two minutes. I find the Moose aproach easy and clear. Its declarative-like syntax is self-documenting. Yes, in six month's time I will have to look-up the docs again to see what the accessors are called, but do you really think I will remember in six month's time that the line-date is the first level of the hash and the station data is the second level of the hash and the Easting, ... is in an array rather than a hash, which are indexed by constants?

        I think we just have to agree to disagree.

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        I have to say that I am very impressed with your writing, both for style and clarity. Based on the discussion between yourself and CountZero, it is really quite easy for me to decide between the two options of OO or native: I feel I must go with the native implementation. Your descriptions and solutions are far more simple and easy to understand than the complex OO alternative offered.

        Wow, you really do seem to be bothered by Moose, which is odd cause you not only don't seem to have ever used it.

        There are no methods, beyond getters and setters, which hashes and arrays already know how to do. In the process, you've hidden all the other things they know how to do. Like:

        • For the hashes: each, exists, defined, keys (in scalar, array and lvalue contexts). values (in scalar, array and lvalue) contexts, delete, undef.
        • For the arrays: pop, push, shift, splice, unshift, $#array, delete, undef; plus their various contextual variations.

        Some of these can be exposed through the addition of generated methods, but all of them? And at what cost? Different names to the ones every half-experienced Perl programmer is already familiar with. get_size() and put_size() instead of $#array as an lvalue.

        All I have to say to that is Moose::Autobox. It will give you your standard names and avoid the "generated methods" overhead.

        And note also, that this is just the number of lines you wrote. It doesn't include the whole of Moose::* and its dependencies: Class::MOP, ( and its dependencies: Sub::Name, MRO::Compat (and its: Class::C3 (and its: Algorithm::C3 ) ), Sub::Exporter (and its: Params::Util, Sub::Install, Data::Optlist ) ).

        Sure, it is a lot of code, but installing it is not really a big issue. We have a pretty decent (usually above 80%) chance of install success on 5.8, 5.8.8 and 5.10.

        And I've left out all those that can generally be expected to be a part of the standard distribution (despite that it requires the latest cpan versions of most of them).

        Well, thats mostly because some of the non-latest version are broken (standard install of Scalar::Util on several platforms is missing the XS version and therefore it's weaken support, this is such a problem that Task::Weaken was created to try and fix it). This is the curse of Core modules, they get frozen in time even if they are broke.

        Including Filter::Simple, which means source filters! (Though I had no luck in working out where in the pile of modules this is actually used?)

        Hmm, you didnt look to hard then, it is pretty obvious in the Makefile.PL.

        # only used by oose.pm, not Moose.pm :P requires 'Filter::Simple' => '0';
        Perhaps you need better glasses, or maybe to just clean all the FUD off the ones your wearing.

        Now the typical reaction to this is "So what"...its all code I didn't have to write and don't have to maintain myself". But when something goes wrong you'll wish you could maintain it yourself.

        Why would you ever, ever, EVER take on maintainence of a module when you could simple submit a bug report? Abandoned modules are one thing, but none of the ones you listed are in any danger of being abandoned any time soon. That statement is just plain silly, by your logic using *any* modules is a bad idea, core or not.

        And really, if something goes wrong, be a responsible community member and submit a bug report, speaking for Moose specifically I can say that we usually turn around bugs pretty quickly. Currently we have 2 "wishlist" items from Schwern and a bug with threading which we are looking for someone who actually uses threads to help with, cause most of the #moose crew do not (seems like it is related to a usage of some more advanced regexp features too, and not related to core Moose). Bugs submitted over IRC or the mailing list tend to be taken care of within a few days though.

        Because when it turns out that the ref-count manipulations in lines 458 & 459 of Class::CS::XS are causing a memory leak, which manifests itself as coming from an anonymous sub generated by the string eval in Class::MOP::Method::Accessor ...

        Funny, I am not seeing your bug report for Class::C3::XS or for Class::MOP. The C3::XS code is also in 5.10, and I am not seeing a bug report there either.

        Are you just making stuff up? Or is this a real issue? Seriously, the FUD coming off you is overwhelming at this point.

        ... and you're urgently trying to get your application back on-line, and the authors of the dozen or so packages involved are arguing about who needs to change what.

        One would think that proper testing would have weeded out this issue before it got to production. But assuming that didn't happen, it should be pretty simple, Brandon Black maintains Class::C3::XS and I maintain Class::MOP and we have worked together on several things before. Sure, that kind of immature cat-fighting might happen sometimes in the CPAN community, but it is certainly not the norm. *cough* excuse me,.. sorry the FUD is making it hard to breath *cough*

        At that point, you'll wonder about the efficacy of replacing Perl's reliable built-ins for such complexity, all for the sake a little syntactic sugar.

        So don't use it, that simple. Oh, and btw, where do they keep Perl's reliable built-in Meta Object Protocol? Cause I must have missed that bit when I wrote Class::MOP. Moose may look like just syntactic sugar, but I assure you the rabbit hole goes much deeper then that.

        Especially as it means you have to write more code to start with, the complexity of the code you write is increased, and the resultant code is if anything less readable than standard Perl.

        In general, for larger applications then this one, Moose means less code.

        Also with Moose, the code complexity is most times actually reduced since many things you would have to do manually over and over, like accessors, are done for you. This greatly reduces the likeliness of copy/paste induced bugs which tend to creep into tedious code like accessors.

        As for the readability, once you know Moose, the code becomes very readable. If you don't know Moose, then that wont be the case. But then again, this is true of all languages, computer or otherwise. Unless, wait,.. do you have some secret programming language you created that can be instantly understood and comprehended by anyone anywhere without them having to learn it? If so please release it so the world can benefit from it ;)

        Love and Kisses,

        -stvn

        Back when I was maintaining a C++ app, there was a struct written as a class, and everything in the struct had accessors. They were all completely simple and transparent such that you could have done away with the lot of them and accessed everything through simple public members. Nevertheless, the accessors were used everywhere. This struct was the heart of the application, so that was many many method calls. It seemed like a waste.

        Then one day the requirements changed, and in some circumstances I'd have to recompute some things when others were changed. I tell you I was mighty grateful that all access went through accessors. I could go to where the class was implemented and make my changes there and nowhere else, and everything worked.

        I shudder to think of the search-and-replace nightmare that would have been my workday had the elements of this struct been accessed directly throughout the program I worked on. As I said, this was the one class that was used everywhere.

        I think most programs grow larger as they age. The "big program" techniques that one uses today might not be necessary today, but they often pay off in the long run. I tend to write software as if it is a little bigger than it actually is. I use strict and warnings in a two-line script. I use objects (sometimes) when a simpler data structure will do, just to collect the code related to that data structure in one place. I use set/get methods to access stuff in objects, even if they're not much more than dumb hashes (though I rarely, if ever, write those accessors myself).

        I think the time lost writing big for a small program that never grows is a reasonable price for the times that I write big for a program that (as usual) does grow.