I can well see the benefit of grouping the code that applies to your paper, issue, article & words data-structures. The nice thing is that you can retain all the benefits--performance; simplicity of both code and syntax--without giving up either the ability to group functionality nor even OO syntax when that is beneficial.
That's the real beauty of Perl's much-maligned default OO mechanisms, you can mix'n'match.
Sticking with the example from Re^6: Data Structures, let's add constructors for lines and stations that validate their parameters. The Seismic package is almost unchanged. Just four lines different, and two of those are use lines:
Seismic.pm
package Seismic;
use Seismic::Station;
use Seismic::Line;
use Exporter;
our @ISA = qw[ Exporter ];
our @EXPORT = qw[ Easting Northing Other Elevation ];
sub new {
my( $class, $filename ) = @_;
my %lines;
open my $in, '<', $filename or die "$filename : $!";
while( <$in> ) {
my( $line, $stn, $x, $y, $other, $z ) =
unpack 'A8xA8xA9A9xA15xA4', $_;
$lines{ $line } = Seismic::Line->new( $line )
unless exists $lines{ $line };
$lines{ $line }{ $stn } = Seismic::Station->new( $x,$y, $other
+, $z );
}
close $in;
return bless \%lines, $class;
}
1;
Instead of assigning the raw values parsed from the file, we call constructors from a couple of other packages: Seismic::Line and Seismic::Station. And they look like this:
Seismic::Line package Seismic::Line;
sub new {
my( $class ) = shift;
return bless {}, $class;
}
1;
Seismic::Station. (This one is complicated :) package Seismic::Station;
use strict;
use warnings;
use Exporter;
use constant {
Easting => 0,
Northing => 1,
Other => 2,
Elevation => 3
};
our @ISA = qw[ Exporter ];
our @EXPORT = qw[ Easting Northing Other Elevation ];
sub dms2real {
my( $degrees, $minutes, $seconds ) = @_;
return $degrees + ( $minutes / 60 ) + ( $seconds / 3600 );
}
sub new {
my( $class, $x, $y, $other, $z ) = @_;
my $self = bless [], $class;
die "Bad Easting ($x)"
unless $x =~ m[^( ([01]\d{2}) ([0-5]\d) ([0-5]\d{2}) ([NS]) )$
+]x;
$self->[ Easting ] = dms2real( $4 eq 'N' ? $2 : - $2, $3, $4 / 10
+);
die "Bad Northing ($y)"
unless $y =~ m[^( (\d{2}) ([0-5]\d) ([0-5]\d{3}) ([EW]) )$]x;
$self->[ Northing ] = dms2real( $4 eq 'E' ? $2 : -$2, $3, $4 / 100
+ );
die "Bad other ($other)"
unless $other =~ m[^\d{15}$]x;
$self->[ Other ] = 0 + $other;
die "Bad Elevation ($z)"
unless $z =~ m[^( [ 0-9]{3}\d )$]x;
$self->[ Elevation ] = 0 + $1;
return $self;
}
1;
That's it. All the code needed so far.
What affect does that have on my hypothetical example of iterating and modifying the Station attributes? The answer is none, nothing, nada:
#! perl -slw
use strict;
use Data::Dump qw[ pp ]; $Data::Dump::MAX_WIDTH = 500;
use Seismic;
my $seismic = Seismic->new( 'seismic.dat' );
for my $lineID ( sort keys %{ $seismic } ) {
my $line = $seismic->{ $lineID };
for my $stnID ( sort{$a<=>$b} keys %{ $line } ) {
my $stn = $line->{ $stnID };
$stn->[ Easting ] -= $stn->[ Easting ] * 0.00001;
$stn->[ Northing ] -= $stn->[ Northing ] * 0.000002;
$stn->[ Elevation ] += 1;
$stn->[ Other ] = int(
( $stn->[ Easting ]
* $stn->[ Northing ]
* $stn->[ Elevation ]
) / 3.0
);
}
}
No changes required. Everything still works exactly as it did before but now we are validating the attributes of the Stations as we read them from the file. (Which turned up more than a few anomalies in my randomly generated data. :)
And the affect on performance?
- cpu:
To parse and build the data-structure of 1000 lines with 10 stations/line; and then iterate them and update the 10,000 stations takes 0.484 seconds instead of 0.219. No problems there.
- Memory:
It may surprise people to know that it now takes less memory than the original. Although attaching bless magic to the structure elements consumes a little more, because I'm converting the station attributes to numeric forms (reals) rather than storing them as strings, they take less space. 4.5MB rather than 5MB. Win-win.
Now we could move the re-calculation of station attributes into the Station package and invoke it as a method:
package Station;
...
sub adjustAttributes {
my( $self ) = shift;
$self->[ Easting ] -= $self->[ Easting ] * 0.00001;
$self->[ Northing ] -= $self->[ Northing ] * 0.000002;
$self->[ Elevation ] += 1;
$self->[ Other ] = int(
( $self->[ Easting ]
* $self->[ Northing ]
* $self->[ Elevation ]
) / 3.0
);
}
and the inner loop of the application would become:
for my $stnID ( sort{$a<=>$b} keys %{ $line } ) {
my $stn = $line->{ $stnID };
$stn->adjustAttributes;
}
But this seems more like a one-off thing, rather than a regular occurrence, and so it is a part of the application code rather than an object method.
If it was a regular thing, then parameterising the 'drift' values is simple. And as it's likely that any such adjustments would be applied to the entire dataset, then it would probably make sense to move the loops into a Class method in the Seismic package. Leaving the adjustAttributes() method in Seismic::Station allows for the possibility that an earthquake causes some lines, or individual stations to move relative to the others.
A perhaps more realistic scenario is that there might be a regular need to determine the highest or lowest station for a give line; or the crow-flies length of a line; or the area of countryside that is likely to be affected by a given size tremour detected at one or more stations. Writing methods for any of these and placing them in the appropriate package is easy. As is adding unit tests.
So you don't have to give up the benefits of OO when using native data-structures.
But neither do you have to give up the benefits of native data-structures when using OO.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
|