Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^5: Data Structures

by CountZero (Bishop)
on May 03, 2008 at 19:34 UTC ( #684352=note: print w/replies, xml ) Need Help??


in reply to Re^4: Data Structures
in thread Data Structures

OK

I use Moose

First I made the class holding the whole structure. I called it Seismic.

use strict; package Seismic; package Seismic::Station; use Moose; has 'Easting' => (isa => 'Int', is => 'rw', required => 1); has 'Northing' => (isa => 'Int', is => 'rw', required => 1); has 'Elevation' => (isa => 'Int', is => 'rw', required => 1); has 'Id' => (isa => 'Str', is => 'rw', required => 1); package Seismic::Line; use Moose; use MooseX::AttributeHelpers; has 'Length' => (isa => 'Int', is => 'rw'); has 'GroupInterval' => (isa => 'Int', is => 'rw'); has 'Id' => (isa => 'Str', is => 'rw', req +uired => 1); has 'Stations' => ( metaclass => 'Collection::Hash', is => 'rw', isa => 'HashRef[Seismic::Station]', default => sub { {} }, provides => { exists => 'station_exists', keys => 'station_ids', get => 'get_station', set => 'add_station', count => 'count_stations', delete => 'delete_station', clear => 'delete_all_stations', }, ); package Seismic::Grid; use Moose; has 'Lines' => ( metaclass => 'Collection::Hash', is => 'rw', isa => 'HashRef[Seismic::Line]', default => sub { {} }, provides => { exists => 'line_exists', keys => 'line_ids', get => 'get_line', set => 'add_line', count => 'count_lines', delete => 'delete_line', clear => 'delete_all_lines', }, ); 1;

And this is the program:

use strict; use Seismic; my $grid = Seismic::Grid->new(); for my $line (1 .. 1000) { my $lineobject = Seismic::Line->new( { Id => $line, Length => int(1000 * rand(90)), GroupInterval => int( 100 * rand(90)), } ); $grid->add_line($lineobject->Id, $lineobject); for my $station (1 .. 10) { my $stationobject = Seismic::Station->new( { Id => $line . '_' . $station, Northing => int(1e6 * rand(90)), Easting => int(1e6 * rand(90)), Elevation => int( 10 * rand(90)), } ); $lineobject->add_station($stationobject->Id, $stationobject); } } print "line station position elev.\n"; for my$line_id ($grid->line_ids) { my $line = $grid->get_line($line_id); for my $station_id ($line->station_ids) { my $station = $line->get_station($station_id); printf "%08d %-8s %08dN%08dW %4d\n", $line->Id, $station->Id, $station->Northing, $station->Easting, $station->Elevation; } }
Everything is stored in one $grid object which consists of $line objects which hold the $station objects.

Direct access to a station and its attributes is easy:

$grid->get_line(257)->get_station('257_5')->Elevation;
Of course, behind the scenes, it is all hashes and arrays and hashes of hashes and ... . And of course it is much slower than programming these data structures directly, but then you don't get these nice accessors, mutators, type-checking, default values, ...

PS: I did not implement all the fields in all the objects, just enough --I hope-- to show it works.

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Replies are listed 'Best First'.
Re^6: Data Structures
by BrowserUk (Pope) on May 04, 2008 at 12:45 UTC
    Of course, behind the scenes, it is all hashes and arrays and hashes of hashes and ... .

    And in one sentence, you've summed up the crux of my object(sic)ion to what you are proposing.

    You're (simply and unnecessarily) wrapping a data structure in an (nest of) objects.

    There are no methods, beyond getters and setters, which hashes and arrays already know how to do. In the process, you've hidden all the other things they know how to do. Like:

    Some of these can be exposed through the addition of generated methods, but all of them? And at what cost? Different names to the ones every half-experienced Perl programmer is already familiar with. get_size() and put_size() instead of $#array as an lvalue.

    Or will that be get_length() and put_length() for this module? Or maybe get_count() and put_count(). Or ...

    Throwing away the familiar, in order to substitute unfamiliar (and variable) naming conventions to do exactly similar things means that instead of the maintenance programmer being able to use the knowledge he has, he has to run away to the documentation--and even the source code--in order to understand the code he is reading.

    I have no idea which of the 20 or so Moose modules I would need to look at in order to find out what the naming convention of the array length attribute of array based collections is. If it is exposed as standard and documented at all. Besides which, if I'm reading things correctly, the Moose user (Moose-based class writer) has the option of changing those names anyway. So that means I have to look it up for every class I use. And every method for every class. The costs of just the documentation lookup time in maintenance is amazing.

    And of course it is much slower than programming these data structures directly,

    And more memory expensive also. In my tests, an order of magnitude slower and requires 50 times more memory. But for the OPs current needs of 1000 lines, that is quite possibly insignificant, so let's not dwell on that. Let's look at what you consider this wrapping of native functionality buys you:

    but then you don't get these nice accessors, mutators, type-checking, default values, ...

    Hm. Let's compare syntax. You suggest that this is "nice":

    $grid->get_line(257)->get_station('257_5')->Elevation;

    The equivalent using native structures:

    $grid->{ 257 }{ 257_5 }[ Elevation ];

    And that's better because? Anticipating your answer to be along the lines of: "Because 'line' and 'station' are explicitly mentioned.", I'll counter with the fact that most accesses will not be in terms of constants, but rather variables for the indexing. So then you have to compare these two versions:

    $grid->get_line($lineID)->get_station($stationID)->Elevation;

    And

    $grid->{ $lineId }{ $stationID }[ Elevation ];

    Don't you find the verbosity and repetition of 'get_line'/$lineID & 'get_station'/$stationID distracting? Pointless? It doesn't look too bad with a single access expression as above, but what about when you come to do some real work with these things?

    Syntax in use

    In the OPs data structure there is another field between Northing and Elevation (which I called 'Other' because I have no idea what it is. See later) but just for the sake of example lets assume that it is wholly or partially derived from the 3D point (Easting, Northing and Elevation). And due to say, continental drift or more accurate GPS or some such, it is necessary to recalculate these values.

    The Easting have to be updated by 0.001% West. The Northings by 0.0002% South. The elevations By +1 unit. And the Other field recalculated according to some formula.

    The code using native data structures:

    my $seismic = Seismic->new( 'seismic.dat' ); for my $lineID ( keys %{ $seismic } ) { for my $stn ( values %{ $seismic->{ $lineID } } ) { $stn->[ Easting ] -= $stn->[ Easting ] * 0.00001; $stn->[ Northing ] -= $stn->[ Northing ] * 0.000002; $stn->[ Elevation ] += 1; $stn->[ Other ] = int( ( $stn->[ Easting ] * $stn->[ Northing ] * $stn->[ Elevation ] ) / 3.0 ); } }

    Using your objects

    ## You didn't provide a constructor from a file ## But you could have. my $grid = Seismic::Grid->new( 'seismic.dat' ); for my $line_id ( $grid->line_ids ) { my $line = $grid->get_line( $line_id ); for my $stn_id ( $line->station_ids } ) { my $station = $line->get_station($station_id); $station->set_Easting( $station->get_Easting - ( $station->get_Easting * 0.00001 ) ); $station->set_Northing( $station->get_Northing - ( $station->get_Northing * 0.000002 ) ); $station->set_Elevation( $station->get_Elevation() - 1 ); ## You didn't provide this attribute, presumably ## cos like me you didn't know what is was ## but you could have $station->set_Other( int( ( $station->get_Easting() * $station->get_Northing() * $station->get_Elevation() ) / 3.0 ); } }

    So, what did what did OO buy you other than verbosity and complexity?

    And before you answer, every time you go to start your reply with 'if', remember that any code written to cater for possibilities or eventualities that aren't in evidence from the OPs stated requirements, as well as being a potential solution to a potential problem, is also effort expended (money) that may never be used.

    But it will still have to be tested and maintained. And when the future requirements of the application are in evidence, it may complicate the actual code needed to satisfy those actual requirements. Or worse, have to be thrown away completely because it is totally incompatible with them.

    That's wasted development time, and testing time, and documentation effort, and interim maintenance effort on the basis of guesses about "what the future may hold".

    For comparison

    , here's my equivalent of your Seismic::Grid package posted above. Ie. The package named Seismic in my examples in this post.
    package Seismic; use Exporter; use constant { Easting => 0, Northing => 1, Other => 2, Elevation => 3 }; our @ISA = qw[ Exporter ]; our @EXPORT = qw[ Easting Northing Other Elevation ]; sub new { my( $class, $filename ) = @_; my %lines; open my $in, '<', $filename or die "$filename : $!"; while( <$in> ) { my( $line, $stn, $x, $y, $other, $z ) = unpack 'A8xA8xA8xA8xxA15xa4', $_; $lines{ $line }{ $stn } = [ $x,$y, $other, $z ]; } close $in; return bless \%lines, $class; } 1;

    Note that this implements the only 'method' actually required by the OPs description--a from-file constructor. Notice how much less code there is than yours above whilst remembering that since the dawn of time (Okay, the software industry), there has been a direct (extra-linear) relationship between lines of code written and bugs found/maintenance required.

    And note also, that this is just the number of lines you wrote. It doesn't include the whole of Moose::* and its dependencies: Class::MOP, ( and its dependencies: Sub::Name, MRO::Compat (and its: Class::C3 (and its: Algorithm::C3 ) ), Sub::Exporter (and its: Params::Util, Sub::Install, Data::Optlist ) ).

    And I've left out all those that can generally be expected to be a part of the standard distribution (despite that it requires the latest cpan versions of most of them). Including Filter::Simple, which means source filters! (Though I had no luck in working out where in the pile of modules this is actually used?)

    Now the typical reaction to this is "So what"...its all code I didn't have to write and don't have to maintain myself". But when something goes wrong you'll wish you could maintain it yourself.

    Because when it turns out that the ref-count manipulations in lines 458 & 459 of Class::CS::XS are causing a memory leak, which manifests itself as coming from an anonymous sub generated by the string eval in Class::MOP::Method::Accessor, and you're urgently trying to get your application back on-line, and the authors of the dozen or so packages involved are arguing about who needs to change what.

    At that point, you'll wonder about the efficacy of replacing Perl's reliable built-ins for such complexity, all for the sake a little syntactic sugar. Especially as it means you have to write more code to start with, the complexity of the code you write is increased, and the resultant code is if anything less readable than standard Perl.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      ++ for the elaborate argumentation of your point. Having been a lawyer for the most part of my life, I know good arguments when I see them.

      Still, it does not convince me.

      If I should follow your reasoning, I should forego the use of modules such as DBI, CGI, List::Utils, ... for the simple reason that one looses control by using other people's modules, especially when they are fairly complex and might hide dragons behind their nice APIs, and because it is faster and more memory efficient to program directly to the bare metal.

      I daresay, I'm amazed you still use Perl at all! Who knows what bugs may lurk in the language and its implementation, or for the same token, in the OS or in the processor's microcode? I think you will still remember the infamous Pentium processor floating point bug. No, no, A sliderule and paper and pencil are the only things you can really trust (if you do all the work yourself of course).

      Sure your approach uses less memory, but I don't get an out-of-memory error and yes it is faster, but I don't mind if it runs 20 seconds or even two minutes. I find the Moose aproach easy and clear. Its declarative-like syntax is self-documenting. Yes, in six month's time I will have to look-up the docs again to see what the accessors are called, but do you really think I will remember in six month's time that the line-date is the first level of the hash and the station data is the second level of the hash and the Easting, ... is in an array rather than a hash, which are indexed by constants?

      I think we just have to agree to disagree.

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        If I should follow your reasoning, I should forego the use of modules such as DBI, CGI, List::Utils, ... for the simple reason that one looses control by using other people's modules, especially when they are fairly complex and might hide dragons behind their nice APIs, and because it is faster and more memory efficient to program directly to the bare metal.

        That's the second time you've used this specious argument. I ignored it the first time but this time I'm going to respond.

        1. DBI.

          A function rich interface for sure, but look at the complexities it is covering up. Every RDBMS going, and all their disperate variations and proprietary features. The result, as complex as it is, is far simpler than any one of the C interfaces it wraps. And far, far simpler than their combined interfaces. Ie. It is a simplification of the many things it encapsulates. (The quoting facility alone is worth its weight. Have you ever read the rules for escaping SQL binary data?).

          And take a look at its dependancy list. Three (all core) modules and one for testing (also core).

        2. CGI

          A bit heavy--mixing the HTML generation stuff in with the CGI stuff is generally seen as not such a good design. Especially as they are also generally eshewed in favour of templating.

          Its dependancy list? One core module.

          But overall, it has stood the test of time and is still actively and responsively maintained. It even caters to both the OO and procedural interfaces.

          I usually opt for CGI::Simple on those (rare) occasions when I do CGI work. But note that it unashamedly derives both its interface and most of its code from CGI. It just omits those parts that most people don't use anyway.

        3. List::Util (no 's'), is my most used module.

          If you supersearch you'll find one of my posts in thread a entitled something like "Your most used modules are?" where I state as much.

          Search a little deeper and you'll find one that says something like: "Is there any program in which List::Util isn't useful?".

          I think reduce should be a part of the language as it will be in Perl 6. But that wouldn't prevent me from using List::Util. Nor having it preloaded into my REPL or as a standard part of my Perl template along with strict, warnings and Data::Dump.

        So I'm not anti-modules, nor anti-CPAN. And neither charge will stick no matter how hard you push it.

        I'm not even anti-Moose. It is without any doubt the very best of the OO modules around. And (if it ever compiles for Win32), whenever I have an application for a module that will benefit from OO, I will seriously consider using Moose. This exercise caused me to look deep into the guts of the beast and I am seriously impressed by both what it does and the way it does it.

        I'm less impressed with some of the dependancies and the choice to use them. A couple of modules (IMO) substitue a module, a use line and a complicated line of code (in the calling program, and a lots of often unnecessary code in the module itself) for a rather simpler, single line of core perl code.

        But for the OPs application, or at least those details we are informed of, the use of OO doesn't simplify through encapsulation. It complicates through encapsulation. Structs as objects is an abuse of OO. Indeed, if I remember correctly, there is a quote from theDamian (Author of Object Oriented Perl) where he lists a few examples of when not to use Object Orientation. I believe that this application (wrapping core functionality without value-add) would fit quite neatly into two (or even three) of his categories.

        Sure your approach uses less memory, but I don't get an out-of-memory error and yes it is faster, but I don't mind if it runs 20 seconds or even two minutes. I find the Moose aproach easy and clear. Its declarative-like syntax is self-documenting. Yes, in six month's time I will have to look-up the docs again to see what the accessors are called, but do you really think I will remember in six month's time that the line-date is the first level of the hash and the station data is the second level of the hash and the Easting, ... is in an array rather than a hash, which are indexed by constants?

        As I said above, for this application, the performance (memory or cpu) doesn't matter a jot. If there were 100,000 or a million lines it might, but that's unlikely given what the data represents. Much more significant is the weight of the documentation. For the module I offered (as offered) there are 25 lines to read. And 30 seconds picks out one line that tells you all you need to know about the data-structure.

        Contrast that with facing the problem of refamiliarisation with Moose's 20+ modules and reams of documentation six months from now. Which do you think will be easier to understand?

        I think we just have to agree to disagree.

        Fair enough. But if this discussion serves to cause you, or anyone who read it, to think twice about wrapping core facilities in complicated and heavyweight wrappers (OO or otherwise), then it will have served good purpose.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
      I have to say that I am very impressed with your writing, both for style and clarity. Based on the discussion between yourself and CountZero, it is really quite easy for me to decide between the two options of OO or native: I feel I must go with the native implementation. Your descriptions and solutions are far more simple and easy to understand than the complex OO alternative offered.

      Wow, you really do seem to be bothered by Moose, which is odd cause you not only don't seem to have ever used it.

      There are no methods, beyond getters and setters, which hashes and arrays already know how to do. In the process, you've hidden all the other things they know how to do. Like:

      • For the hashes: each, exists, defined, keys (in scalar, array and lvalue contexts). values (in scalar, array and lvalue) contexts, delete, undef.
      • For the arrays: pop, push, shift, splice, unshift, $#array, delete, undef; plus their various contextual variations.

      Some of these can be exposed through the addition of generated methods, but all of them? And at what cost? Different names to the ones every half-experienced Perl programmer is already familiar with. get_size() and put_size() instead of $#array as an lvalue.

      All I have to say to that is Moose::Autobox. It will give you your standard names and avoid the "generated methods" overhead.

      And note also, that this is just the number of lines you wrote. It doesn't include the whole of Moose::* and its dependencies: Class::MOP, ( and its dependencies: Sub::Name, MRO::Compat (and its: Class::C3 (and its: Algorithm::C3 ) ), Sub::Exporter (and its: Params::Util, Sub::Install, Data::Optlist ) ).

      Sure, it is a lot of code, but installing it is not really a big issue. We have a pretty decent (usually above 80%) chance of install success on 5.8, 5.8.8 and 5.10.

      And I've left out all those that can generally be expected to be a part of the standard distribution (despite that it requires the latest cpan versions of most of them).

      Well, thats mostly because some of the non-latest version are broken (standard install of Scalar::Util on several platforms is missing the XS version and therefore it's weaken support, this is such a problem that Task::Weaken was created to try and fix it). This is the curse of Core modules, they get frozen in time even if they are broke.

      Including Filter::Simple, which means source filters! (Though I had no luck in working out where in the pile of modules this is actually used?)

      Hmm, you didnt look to hard then, it is pretty obvious in the Makefile.PL.

      # only used by oose.pm, not Moose.pm :P requires 'Filter::Simple' => '0';
      Perhaps you need better glasses, or maybe to just clean all the FUD off the ones your wearing.

      Now the typical reaction to this is "So what"...its all code I didn't have to write and don't have to maintain myself". But when something goes wrong you'll wish you could maintain it yourself.

      Why would you ever, ever, EVER take on maintainence of a module when you could simple submit a bug report? Abandoned modules are one thing, but none of the ones you listed are in any danger of being abandoned any time soon. That statement is just plain silly, by your logic using *any* modules is a bad idea, core or not.

      And really, if something goes wrong, be a responsible community member and submit a bug report, speaking for Moose specifically I can say that we usually turn around bugs pretty quickly. Currently we have 2 "wishlist" items from Schwern and a bug with threading which we are looking for someone who actually uses threads to help with, cause most of the #moose crew do not (seems like it is related to a usage of some more advanced regexp features too, and not related to core Moose). Bugs submitted over IRC or the mailing list tend to be taken care of within a few days though.

      Because when it turns out that the ref-count manipulations in lines 458 & 459 of Class::CS::XS are causing a memory leak, which manifests itself as coming from an anonymous sub generated by the string eval in Class::MOP::Method::Accessor ...

      Funny, I am not seeing your bug report for Class::C3::XS or for Class::MOP. The C3::XS code is also in 5.10, and I am not seeing a bug report there either.

      Are you just making stuff up? Or is this a real issue? Seriously, the FUD coming off you is overwhelming at this point.

      ... and you're urgently trying to get your application back on-line, and the authors of the dozen or so packages involved are arguing about who needs to change what.

      One would think that proper testing would have weeded out this issue before it got to production. But assuming that didn't happen, it should be pretty simple, Brandon Black maintains Class::C3::XS and I maintain Class::MOP and we have worked together on several things before. Sure, that kind of immature cat-fighting might happen sometimes in the CPAN community, but it is certainly not the norm. *cough* excuse me,.. sorry the FUD is making it hard to breath *cough*

      At that point, you'll wonder about the efficacy of replacing Perl's reliable built-ins for such complexity, all for the sake a little syntactic sugar.

      So don't use it, that simple. Oh, and btw, where do they keep Perl's reliable built-in Meta Object Protocol? Cause I must have missed that bit when I wrote Class::MOP. Moose may look like just syntactic sugar, but I assure you the rabbit hole goes much deeper then that.

      Especially as it means you have to write more code to start with, the complexity of the code you write is increased, and the resultant code is if anything less readable than standard Perl.

      In general, for larger applications then this one, Moose means less code.

      Also with Moose, the code complexity is most times actually reduced since many things you would have to do manually over and over, like accessors, are done for you. This greatly reduces the likeliness of copy/paste induced bugs which tend to creep into tedious code like accessors.

      As for the readability, once you know Moose, the code becomes very readable. If you don't know Moose, then that wont be the case. But then again, this is true of all languages, computer or otherwise. Unless, wait,.. do you have some secret programming language you created that can be instantly understood and comprehended by anyone anywhere without them having to learn it? If so please release it so the world can benefit from it ;)

      Love and Kisses,

      -stvn

      Back when I was maintaining a C++ app, there was a struct written as a class, and everything in the struct had accessors. They were all completely simple and transparent such that you could have done away with the lot of them and accessed everything through simple public members. Nevertheless, the accessors were used everywhere. This struct was the heart of the application, so that was many many method calls. It seemed like a waste.

      Then one day the requirements changed, and in some circumstances I'd have to recompute some things when others were changed. I tell you I was mighty grateful that all access went through accessors. I could go to where the class was implemented and make my changes there and nowhere else, and everything worked.

      I shudder to think of the search-and-replace nightmare that would have been my workday had the elements of this struct been accessed directly throughout the program I worked on. As I said, this was the one class that was used everywhere.

      I think most programs grow larger as they age. The "big program" techniques that one uses today might not be necessary today, but they often pay off in the long run. I tend to write software as if it is a little bigger than it actually is. I use strict and warnings in a two-line script. I use objects (sometimes) when a simpler data structure will do, just to collect the code related to that data structure in one place. I use set/get methods to access stuff in objects, even if they're not much more than dumb hashes (though I rarely, if ever, write those accessors myself).

      I think the time lost writing big for a small program that never grows is a reasonable price for the times that I write big for a program that (as usual) does grow.

        That lead me to think of something that might be worthwhile. It'd be cool to have compile-time optimization of simple accessors such that you could redeclare the accessors in the class definition and the simple accessors are no longer optimized and no users of those accessors need to be updated.

        So you declare class Foo objects have public members of .x and $foo.x is compiled into very efficient accessing of that member. Down the road, you decide to write an explicit and more complicated accessor for .x. When code using this class gets compiled with "use Foo;" loading the new version of the class, then $y= $foo.x; gets compiled into $y= $foo.x(); while $foo.x= $y; gets compiled into $foo.x( $y );.

        Of course, this probably won't happen in Perl 6 because the general case requires \$foo.x to be compiled into something that ties a scalar value such that a fetch of that value calls $foo.x() while a store to that value calls $foo.x( ... ), and history shows that this inefficient general case will likely be used for the simple cases as well. But perhaps Perl 6's design goal of being more easily optimized might change that.

        Or we could just restrict such to read-only attributes. Then you could have code doing $y= $foo.x; that doesn't need to be updated (and is as efficient as it can be) and if you have any code that does $foo.x= $y;, then (once you've declared the member variable as requiring an accessor) you'd get a compile-time error telling you exactly what code needs to be changed. You'd need only re-compile ("perl -c") all code in your repository to verify that you don't have anything left to fix.

        There could even be a distinction between 1) ".x is private" meaning using $foo.x outside of a method calls the accessor while $self.x inside of a method does a direct access; vs. 2) ".x() is the accessor for ._x", which would require any place in your classes that modifies .x directly to be upgraded to either $foo.x( ... ); or $foo._x= ...;.

        Anyway, I don't recall reading of such an idea in Perl 6 design documents, but that was a long time ago so I could easily have forgotten or it could have been introduced since then.

        - tye        

        If your experience is that all your applications benefit from up-front over design in the latter stages of their life, then I would not seek to change your mind. But if this is an isolated example from all the projects where you have written extra code up front, that you perceive has saved you time later in the life if the project, then maybe you should consider the calculation (extra_time_spent * no_of_projects) - ( time_saved * those_that_have_benefited).

        Maybe for you that comes out positive, which would indicate that your powers of prescience are well above the average, because every informal calculation I've seen has shown a negative.

        And the size of the project has nothing to do with it. It's all to do with the nature of the data. Data formats rarely change substantially once they are defined. And wrapping native data-structures in OO-wrappers rarely generates an RIO.

        Besides which, with good OO design, there should be no need for the interface to expose the internal attributes. Directly or indirectly. There should be no methods who's purpose is to only and directly change the internal state of an object.

        All interactions with an object (after construction), should be in terms of actions (behaviours, messages) invoked upon instances, passing the information it requires to perform the action.

        What do I mean by this?

        By way of example, let's take a Time or Date or DateTime class.

        The underlying data for all these is simply a (signed) integer representing the data-point plus a base date (representing epoch. (And on Earth, a timezone, but let's omit that for simplicity of discussion.).

        On a traditional unix system that might be a 32-bit signed integer in seconds and Midnight, 1st Jan, 1970. On newer *nix systems, the base stays the same but the granularity of the units increases to milliseconds. On Win32 systems, the base date is Midnight, 1st Jan 1600 and the integer is 64-bits.

        But for whichever system, the code required to produce a new DateTime object representing a data-point 24 hours later is the same:

        DateTime tomorrow = DateTime->Now->plusDelta( DateTime::Delta->days( 1 + ) );

        Here, DateTime::Delta->days( 1 ) is a subclass constructor that returns an integer (in units compatible with the bases system) representing (in this case) 24 hours.

        The plusDelta() method produces a new DateTime object that is the result of applying a DateTime::Delta() object to an existing instance of a DateTime() object.

        DateTime->Now() is a constructor that returns a DateTime object representing the current time in the current epoch.

        The body of the plusDelta() method is (in perl terms) just:

        sub plusDelta( my( $self, $delta ) = @_; return bless \( $self += $delta ), __PACKAGE__; }

        Note: The raw value of both the DateTime and the Delta objects are accessed directly (via overloading of the dereference operator. And the resultant (new) DateTime object is constructed directly from the result of the calculation. This works because both the DateTime object and the Delta Object are implemented as blessed scalars.

        The integers those scalar hold is the only instance data required, because everything else about them can be derived. The units they represent is determined at startup from the system. The granularity, epoch (and TZ) are available via Class constants.

        Assuming the availability of 64-bit math where required, the plusDelta() method does not have to change regardless of the size of the integer, the units it is measured in, the Epoch upon which it is based nor the system it is running on, because it just arithmetic.

        I don't need to define an array-wrapping special collection class to form aggregates of these objects because I can store them directly in a bog standard array. I can then sort that array using the built-in sort. Compare them using normal syntax: dt[ 1 ] > dt[ 203 ].

        Because each object contains only the minimum of internal state, they are very light. Aggregates take up far less space. Their representation means that most operations can be done with standard syntax making them far easier to use. Portability is simplified because most operations are done in terms of simple arithmetic operators. Performance is increased by direct access to the state (internally and externally).

        Only constructors (and only those that construct new instances from different representations (eg. strings) need to perform costly validation. As any integer value represents a valid DateTime object, no further validation is required. So long as users create objects using constructors, methods need do no further validation, as all operations are arithmetic and will give consistent results regardless of platform, epoch, timezone or base. (Unless Perl or the underlying runtime suddenly forget how to do math. No getters & setters need be provided.)

        Compare this simplicity with the weight & complexity of existing solutions--in Perl and other languages and libraries.

        I won't expand in detail on it here, but a similar case can be made for (say) a Point3D object. Internally represented by a blessed anonymous array containing 3 numbers, the coordinates can change from 32-bit integers to 64-bit integers, to reals to complex, to rational, simply by changing the types of the numbers stored in the anon array. All the methods manipulating these objects are just doing math. Math operates correctly whatever the representation of the numbers so no validation is required after their construction.

        If it is necessary for a given application to constrain operations to some subset of the 3D universe, then subclassing the Class and applying post condition validation on the parental constructors is sufficient. If the resultant object is outside of the constraining dimensions, the input must have been wrong. The subclass can then choose the appropriate coarse of action. be it taise an exception to report the problem, or coercion to correct it.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://684352]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (4)
As of 2021-04-20 03:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?