Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Make everything an object?

by wfsp (Abbot)
on May 08, 2008 at 21:04 UTC ( [id://685541]=perlmeditation: print w/replies, xml ) Need Help??

Part of a website includes an online edition of a weekly newspaper. I have a website that includes just the articles, indexed by year/issue and with a simple keyword search. The editors of the newspaper use it for their research. In the future it maybe adopted by the main site.

I have an app that keeps my site in sync and has worked well for a couple of years. There have been some changes on the main site so I’ve taken the opportunity to do a review.

It is _one big script_ containing the mother of all data structures, so breaking it down was on the order of the day.

I thought I’d give the objects containing objects idea a run for its money. An outline sketch showing the public methods:

  • Paper
    • issues (get which issues to update, usually about four off)
    • process (update dbs, output html)
    • post_process (tidies up, reports)
  • Issue
    • article (get each article, a couple of dozen per issue)
  • Article
    • constructor: gets text and html
  • Words
    • get_words,
    • get_word_count
      (between 500 and 2,000 words per article
The script looks like:
my $paper = Paper->new($cnf_file); for my $issue ($paper->issues){ while (my $article = $issue->article){ # an iterator $paper->process($article); } } $paper->post_process;
Each specific task has its own module with its own specific methods and attributes and its own API. The detailed work (e.g. scraping an issue index page to get the url for each article) is easily tested in isolation. With judicious use of special configuration files for debugging I’ve avoided messing with the ‘production’ db or whacking the web server during development (I must admit I wish I had thought of that earlier – one unfortunate infinite loop later…).

It took a while to get the ‘model’ straight (and may still be far from perfect) but I feel the development time thereafter was much reduced. You can tinker with the innards of an object ‘til your hearts content and as long as you abide by the ‘published API’ no harm is done. I’m sure that with a script that relies heavily on LWP any overhead introduced by using Perl objects will hardly be noticed.

I’m pleased that I’ve finally got the Word module out on its own. It takes a string and returns words. I can use that again.

I like the black box concept, I found I could concentrate and test one thing at a time. I started this about a month ago and I’ve been harbouring a meditation along these lines for some time. Long before the most recent ruin and destruction wrought by IconoclastUK. :-)

Sure, more typing, more things to go wrong but I look on it as “Perl between handrails”. :-)

Replies are listed 'Best First'.
Good APIs are the key (was Re: Make everything an object?)
by dragonchild (Archbishop) on May 08, 2008 at 21:23 UTC
    BrowserUk makes a number of good points and he's arguing from a sound starting point. Objects are way overused and, ofttimes, there are much better ways of handling things.

    The point behind objects is to have intelligence. I have recently written a SQL parser. The parser emits a whole mess of objects and I think BrowserUk would agree with why. The first is that the whole thing knows how to stringify itself, from any point. The second is that I can descend the parsetree simply and easily without needing to know how its represented. Most parsetrees are somewhat regular. SQL's isn't. Hence, hiding complexity.

    Your point, too, is well-taken. But, it's not a point about objects - it's a point about APIs. Your same point can be made about good libraries, too. Take Perl. It's a program that responds to an API. Do you care how hashes are implemented? The hashing algorithm changed signficantly between 5.6.0 and 5.6.1 - did you notice? Did you even know?


    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
Re: Make everything an object?
by sundialsvc4 (Abbot) on May 09, 2008 at 04:27 UTC

    It appears to me that you're on a good right-track, superdoc.

    I find that the “object” concept has three major advantages ... all of which are compelling in the right situations.

    1. Encapsulation:   The data and the code are together in just one place. When you've got “a reference,” you've got a reference to the code and to the correctly-associated data. Multiple instances of the same object can be relied-upon to refer to the same code but distinct data. Very nice...
    2. Inheritance:   Sometimes the easiest way to describe something is to say that “this is exactly like that ... except ...” In those cases, it's extremely useful to be able to write code that only deals with the “except...”
    3. “Why Do I (Have To...) Care?” Usually, you just want “a Thing” to “work.” You not want to know how and why that “Thing” actually works ... for the same reason that you don't want to have to know why your car works. You just want to be able to say, “here, Fido! Fetch!” ... without concerning yourself with the low-level details of metabolizing dog-food.
Re: Make everything an object?
by BrowserUk (Patriarch) on May 09, 2008 at 16:09 UTC

    I can well see the benefit of grouping the code that applies to your paper, issue, article & words data-structures. The nice thing is that you can retain all the benefits--performance; simplicity of both code and syntax--without giving up either the ability to group functionality nor even OO syntax when that is beneficial.

    That's the real beauty of Perl's much-maligned default OO mechanisms, you can mix'n'match.

    Sticking with the example from Re^6: Data Structures, let's add constructors for lines and stations that validate their parameters. The Seismic package is almost unchanged. Just four lines different, and two of those are use lines:

    Seismic.pm

    package Seismic; use Seismic::Station; use Seismic::Line; use Exporter; our @ISA = qw[ Exporter ]; our @EXPORT = qw[ Easting Northing Other Elevation ]; sub new { my( $class, $filename ) = @_; my %lines; open my $in, '<', $filename or die "$filename : $!"; while( <$in> ) { my( $line, $stn, $x, $y, $other, $z ) = unpack 'A8xA8xA9A9xA15xA4', $_; $lines{ $line } = Seismic::Line->new( $line ) unless exists $lines{ $line }; $lines{ $line }{ $stn } = Seismic::Station->new( $x,$y, $other +, $z ); } close $in; return bless \%lines, $class; } 1;

    Instead of assigning the raw values parsed from the file, we call constructors from a couple of other packages: Seismic::Line and Seismic::Station. And they look like this:

    Seismic::Line

    package Seismic::Line; sub new { my( $class ) = shift; return bless {}, $class; } 1;

    Seismic::Station. (This one is complicated :)

    package Seismic::Station; use strict; use warnings; use Exporter; use constant { Easting => 0, Northing => 1, Other => 2, Elevation => 3 }; our @ISA = qw[ Exporter ]; our @EXPORT = qw[ Easting Northing Other Elevation ]; sub dms2real { my( $degrees, $minutes, $seconds ) = @_; return $degrees + ( $minutes / 60 ) + ( $seconds / 3600 ); } sub new { my( $class, $x, $y, $other, $z ) = @_; my $self = bless [], $class; die "Bad Easting ($x)" unless $x =~ m[^( ([01]\d{2}) ([0-5]\d) ([0-5]\d{2}) ([NS]) )$ +]x; $self->[ Easting ] = dms2real( $4 eq 'N' ? $2 : - $2, $3, $4 / 10 +); die "Bad Northing ($y)" unless $y =~ m[^( (\d{2}) ([0-5]\d) ([0-5]\d{3}) ([EW]) )$]x; $self->[ Northing ] = dms2real( $4 eq 'E' ? $2 : -$2, $3, $4 / 100 + ); die "Bad other ($other)" unless $other =~ m[^\d{15}$]x; $self->[ Other ] = 0 + $other; die "Bad Elevation ($z)" unless $z =~ m[^( [ 0-9]{3}\d )$]x; $self->[ Elevation ] = 0 + $1; return $self; } 1;

    That's it. All the code needed so far.

    What affect does that have on my hypothetical example

    of iterating and modifying the Station attributes? The answer is none, nothing, nada:
    #! perl -slw use strict; use Data::Dump qw[ pp ]; $Data::Dump::MAX_WIDTH = 500; use Seismic; my $seismic = Seismic->new( 'seismic.dat' ); for my $lineID ( sort keys %{ $seismic } ) { my $line = $seismic->{ $lineID }; for my $stnID ( sort{$a<=>$b} keys %{ $line } ) { my $stn = $line->{ $stnID }; $stn->[ Easting ] -= $stn->[ Easting ] * 0.00001; $stn->[ Northing ] -= $stn->[ Northing ] * 0.000002; $stn->[ Elevation ] += 1; $stn->[ Other ] = int( ( $stn->[ Easting ] * $stn->[ Northing ] * $stn->[ Elevation ] ) / 3.0 ); } }

    No changes required. Everything still works exactly as it did before but now we are validating the attributes of the Stations as we read them from the file. (Which turned up more than a few anomalies in my randomly generated data. :)

    And the affect on performance?

    • cpu:

      To parse and build the data-structure of 1000 lines with 10 stations/line; and then iterate them and update the 10,000 stations takes 0.484 seconds instead of 0.219. No problems there.

    • Memory:

      It may surprise people to know that it now takes less memory than the original. Although attaching bless magic to the structure elements consumes a little more, because I'm converting the station attributes to numeric forms (reals) rather than storing them as strings, they take less space. 4.5MB rather than 5MB. Win-win.

    Now we could move the re-calculation of station attributes into the Station package and invoke it as a method:

    package Station; ... sub adjustAttributes { my( $self ) = shift; $self->[ Easting ] -= $self->[ Easting ] * 0.00001; $self->[ Northing ] -= $self->[ Northing ] * 0.000002; $self->[ Elevation ] += 1; $self->[ Other ] = int( ( $self->[ Easting ] * $self->[ Northing ] * $self->[ Elevation ] ) / 3.0 ); }

    and the inner loop of the application would become:

    for my $stnID ( sort{$a<=>$b} keys %{ $line } ) { my $stn = $line->{ $stnID }; $stn->adjustAttributes; }

    But this seems more like a one-off thing, rather than a regular occurrence, and so it is a part of the application code rather than an object method.

    If it was a regular thing, then parameterising the 'drift' values is simple. And as it's likely that any such adjustments would be applied to the entire dataset, then it would probably make sense to move the loops into a Class method in the Seismic package. Leaving the adjustAttributes() method in Seismic::Station allows for the possibility that an earthquake causes some lines, or individual stations to move relative to the others.

    A perhaps more realistic scenario is that there might be a regular need to determine the highest or lowest station for a give line; or the crow-flies length of a line; or the area of countryside that is likely to be affected by a given size tremour detected at one or more stations. Writing methods for any of these and placing them in the appropriate package is easy. As is adding unit tests.

    So you don't have to give up the benefits of OO when using native data-structures.

    But neither do you have to give up the benefits of native data-structures when using OO.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Make everything an object?
by Ovid (Cardinal) on May 12, 2008 at 16:13 UTC

    I find repeatedly that simple code written in a procedural style becomes too limiting for me and, as a result, I often switch to objects. However, a very important concern of objects is that they should be responsible for the things they're responsible for and the calling code should not. As a result, I noticed something rather curious about your code (seems like a great start, but you're still thinking procedurally :):

    my $paper = Paper->new($cnf_file); for my $issue ($paper->issues){ while (my $article = $issue->article){ # an iterator $paper->process($article); } } $paper->post_process;

    Your paper is processing articles, which are fetched from issues, which in turn are fetched from the paper, which processes the articles, which are fetched from ...

    See something circular there? :) Why not have the paper manage this responsibility since it knows everything in needs to know?

    my $paper = Paper->new($cnf_file); # and in your Paper package: sub new { my ( $class, $cnf_file ) = @_; my $self = # construct your object $self->_initiaze; } sub _initialize { for my $issue ($self->_issues){ while (my $article = $issue->_article){ $self->_process($article); } } $self->_post_process; }

    That pushes the responsibility of processing the issues into the Paper class. Also note that I've made most of your methods private (starting with leading underscores). You should not make them public unless you must have access to them outside of this class, though I suspect you'll want to do this with your issues and articles.

    The benefit of this approach is that the logic of processing the paper is encapsulated in your paper class and if you need to rewrite it, you can try to keep the same API rather than hunt through your code and find every place which is mistakenly controlling this logic. Just tell the paper what to do and let it do it :) You also might want to read The AuthenticationFairy for more background on this.

    At this point, the question is whether or not you would need to intervene in any of those steps (i.e., do you need to filter out issues or something?). If you don't, the above code is fine. Otherwise, thinking about how to do that without exposing the process is important, particularly since exposing that logic can leave the object in an inconsistent state and this should never happen.

    Cheers,
    Ovid

    New address of my CGI Course.

      Thanks for your response.

      Yes, it is a tad circular. Your comment about filtering the issues is partly responsible.

      This may eventually be a cgi app and in the back of my mind is the possibility of a backlog of issues to be updated. Processing them all at once could be bit of a strain on the web server. It may be best to show the user a list of outstanding issues and prompt to select one, rinse and repeat etc.. What's important is that all of an issue's articles are processed together not that all the issues are processed together.

      I'm incorporating your points about "not exposing the process" and already the logic is looking clearer and the API is shrinking - always a Good Thing ™. :-)

      Again, many thanks.

        If stuffing everything inside your objects works for you, then go for it.

        But there is a point of view, most clearly expounded by a certain author of several very well respected books on C++ that strongly advocates not sticking everything inside your class.

        A simple quote: "If you're writing a function that can be implemented as either a member or as a non-friend non-member, you should prefer to implement it as a non-member function. That decision increases class encapsulation. When you think encapsulation, you should think non-member functions."

        A "non-friend non-member function" is also called a "free function". In Perl terms: a good old-fashioned sub rather than a method. The gist of the argument is that the only measure of encapsulation is the amount of code (number of functions/methods) that needs to change if the data format changes. Any function implemented as a method will likely need to change if the data format changes.

        If, on the other hand, that function can be implemented as a non-memeber, non-friend, "free" function (sub) it won't have to change if the data format changes, so doing so increases encapsulation.

        Don't be blinded by OO-dogma. To paraphrase and widen a quote from the same article. OO is a means not an end. OO is only useful because it yields other things that we care about. The greatest benefit OO can yield, indeed the greatest benefit any programming technique or methodology can yield, is simplicity. Whilst it does so, it is beneficial. As soon as it starts creating complexity that can be avoided--lines of code that can be avoided--it ceases to be beneficial.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://685541]
Approved by Corion
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2024-04-19 03:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found