Tutorial: Introduction to Object-Oriented Programming

Object-Oriented Tutorial

Prerequisites

The following knowledge will generally be assumed:

references -- what they look like and how to use them
modules and packages -- how to create one, how to use one

Introduction

I was trying to avoid having to write this section, since the philosophy that I wanted to capture with this document was teaching by doing, rather than by throwing a lot of terms at you and then telling you what I meant after you were thoroughly confused and intimidated. However rob_au very rightly pointed out that I needed some sort of introduction to the idea behind object-oriented coding in order for anything that I was saying to make sense. So here goes...

At its (simplified) heart, OO programming is about creating discrete packets or groupings of data (what we call objects) that model some 'thing' in the application space. So, an employee application might use an Employee object to capture basic pieces of data about each employee -- their age, their Social Security or National Insurance Number, their job title, and so on -- while a zoo's appliation might use an Animal object with the relevant pieces of data needed to keep the animals alive and health -- their dietary needs, whether they are dangerous to humans, the number of legs, and so forth.

So rather than having to look up each piece of information one bit at a time, one piece of an application might hand over an employee object to another piece of the application, and the second component can just ask the employee object for the information it needs without having to know anything about where this data came from (a database, a flat file, a pipe, etc.) or how it was stored (comma-delimitted, tab-delimitted, database rows).

In fact, almost anything can be turned into an object if you look at it the right way. The question you always need to keep in the back of your mind is "Is this something that needs to be an object in order to improve either functionality or re-usability?" Because OO is a lot more work than a straight procedural script, but it can also be much more powerful.

Another way of looking at the difference between procedural scripts and object-oriented applications is that the former tend to focus on what you could call the verbs of a sentence: the user submits a form, the form is validated by a subroutine, and then inserted into a database. OO, on the other hand, looks more at the nouns: the user submits a form, the form is validated, and then saved to a database. The sentence is almost the same, but it reads very differently, and this reflects the real world of OO vs. procedural: often it is as much a judgement call as anything else when an application should be expressed as a set of objects rather than a set of scripts.

Obviously there's a lot more to in than that, but hopefully this is enough to get you oriented.

To Begin: An Example

A lot of OO tutorials take the approach of explaining the terminology, and then introducing some examples to help you make sense of it all. I'm going to try the exact opposite -- using an example that will (with luck) make sense as a way of introducing the terminology.

The Project

I'm going to take as a starting point this node since it offers a good way to come to grips with when it might make sense to switch to OO vs. remaining in a procedural mind set. It also reveals a little of why OO is so damned hard.

So our fictional project is going to be to set up a system to handle user-submitted quotes on a Web site -- users should be able to submit new quotes, have them reviewed by an Administrator (for rudeness or duplication) and then see them show up on the Web site at a later date.

Before we even start coding, we should probably jot down some ideas about how a quote would work. Here are the things that come to mind for me... All quotes would have:

A pithy phrase of some kind -- "Who are you who are so wise in the ways of science?"
An author -- "Sir Bedevere"
A date -- 1066 A.D.
A context -- "The Quest for the Holy Grail"
Approved -- whether or not the quote has been approved by an administrator
Approver -- which administrator gave the approval
Submitter -- the user who submitted the quote
Last Shown -- the last time a quote was shown on our Web site

Of course, there are all kind of problems with this first cut -- for instance, some quotes might not have a date, or maybe the date is extremely specific (June 26th, 1963 for JFK's "Ich bin ein Berliner") -- but we're going to keep it simple and just use an author, a phrase, and an approval.

A Few Prototype functions

I'm assuming that you are familiar with subroutines and most of the basics of Perl programming, so you know why we'd want to look at capturing some of the ideas above as subroutines. So here's some basic code:

use strict;

my $quote_ref  = {
  phrase   => "Foo",
  author   => "Bar",
  approved => 1,
};

set_phrase($quote_ref, "Bee");
print STDOUT get_phrase($quote_ref) . "\n";
print STDOUT get_author($quote_ref) . "\n";
print STDOUT (is_approved($quote_ref) ? "Is approved\n" : "Is not appr
+oved\n");

exit 0;

sub get_phrase {
  my $hash_ref = shift;
  return $hash_ref->{phrase};
}

sub set_phrase {
  my $hash_ref = shift;
  $hash_ref->{phrase} = shift;
}

sub get_author {
  my $hash_ref = shift;
  return $hash_ref->{author};
}

sub set_author {
  my $hash_ref = shift;
  $hash_ref->{author} = shift;
}

sub is_approved {
  my $hash_ref = shift;
  return $hash_ref->{approved};
}
[download]

Right now there's not a lot of utility in this script -- why would anyone test a local variable using a subroutine instead of just writing my $phrase = "Foo";?

Modules

Many of you are no doubt already familiar with modules, any time you've put a use at the top of your script you've imported a module for use in your script. You may well have written a few yourself to improve the re-usability of some handy tools that you developed along the way. Let's say that I expect my one quote to be used in many places and that I want to modularize it in a quasi-useful way. I might change the file above to look like this:

package Quote;

use strict;

my $quote_ref = {
  phrase   => "Foo",
  author   => "Bar",
  approved => 1,
};

sub get_phrase {
  return $quote_ref->{phrase};
}

sub set_phrase {
  $quote_ref->{phrase} = shift;
}

sub get_author {
  return $quote_ref->{author};
}

sub set_author {
  $quote_ref->{author} = shift;
}

sub is_approved {
  return $quote_ref->{approved};
}

1;
[download]

Now, my script would look like this:

use strict;
use Quote;

print STDOUT Quote::get_phrase(), "\n";
print STDOUT Quote::get_author(), "\n";
print STDOUT (Quote::is_approved() ? "Is approved" : "Is not approved"
+), "\n";

exit 0;
[download]

This prints:

Foo
Bar
Is approved
[download]

I have now successfully isolated my quote from the script that calls it. I could have two variables named $quote_ref (one in the script, one in the module) and they would never come into conflict with each other since one is in the main namespace, and the other is off in the Quote namespace. I could ask for this quote in other scripts and get all of this (dubious) utility over there as well. If I update my quote, then I don't need to waste my time updating every script that uses this module.

I now have a working module, but it's not really terribly useful since I'd have to create a new module (if I stuck to this system) for each quote. I can just see the fun of managing Quote::People::JFK::Berlin, Quote::Movies::MontyPython, etc.

Ideally, my Quote module would be some kind of 'super-quote' and not only would it have a way to hold the data (the quote, its author, and so forth) for many different quotes, but it would also give me some handy subroutines to do useful things that are quote-related. In other words, what I really want is an abstract representation of all quotes whether they are funny, political, rude, or profound -- what I really want a class.

Classes

The easiest way to think about a class is to think of it as a prototype, in code, for all quotes -- all of the variables and functions are there, they are just waiting to be filled in at runtime by a specific quote.

Going back to my list above, I can see many of the handy things that I would want my quote class to have. And looking at the code above, I can see some of how I want that to work since, at it's core, a class is very much like a module with a few special features.

Here's what the Quote class might look like:

package Quote;

use strict;

sub new {

  my $class = shift;
  my $self = {};
  return bless $self, $class;
}

sub set_phrase {
  my $self = shift;
  my $phrase = shift;
  $self->{phrase} = $phrase;
}

sub get_phrase {
  my $self = shift;
  return $self->{phrase};
}

sub set_author {
  my $self = shift;
  my $author = shift;
  $self->{author} = $author;
}

sub get_author {
  my $self = shift;
  return $self->{author};
}

sub is_approved {
  my $self = shift;
  @_ ? $self->{_approved} = shift : $self->{_approved};
}

1;
[download]

That might now look thoroughly confusing, but it's helpful to compare it against our previous Quote package to see what's changed. There are really only a few key points:

There's a new subroutine called new -- this is a special subroutine that helps us to create a new object based on our Quote class. In truth, I could have called it create, or generate, or dog, but let's stick with conventions which state that new is the name of the special subroutine that instantiates a new object of class Quote.
I also added all of these my $self = shift; all over the place. Hmmm, it looks a lot like that hash ref that we were passing in in our first test code, doesn't it?
But somehow I did away with the my quote_ref declared in the old Quote.pm
There's also that word bless that you probably haven't seen before

But let's turn back to our script (the one that uses Quote.pm) to see how it has changed before I try to explain this any more.

use strict;
use Quote;

my $phrase = "Baz";
my $author = "Foo";

my $quote = Quote->new();
$quote->set_phrase($phrase);
$quote->set_author($author);

print STDOUT $quote->get_phrase(), "\n";
print STDOUT $quote->get_author(), "\n";
print STDOUT ($quote->is_approved() ? "Is approved" : "Is not approved
+"), "\n";

exit 0;
[download]

This prints out:

Baz
Foo
Is not approved
[download]

There, now we've seen an object in action, but what happened?

Blessing

First, you can see that I created a new Quote object using the syntax Quote->new() and assigned it to a variable called $quote.

I then called my subroutines (set_phrase, get_phrase and so on) using $quote as a kind of handle -- I no longer need to say Quote::get_phrase(), I just say $quote->get_phrase() and Perl knows which subroutine (or method as they are known in OO) to run.

Of course, in Perl there is always more than one way to do it (TMTOWTDI), and I should note that instead of saying Quote->new() I could also have written new Quote and left the rest of the script exactly as-is. It's mostly a matter of style, however the preferred style is Quote->new() as it removes any ambiguity that can confuse either the reader/user or the compiler.

To understand a little more about what is happening, let's make a few changes to your Quote.pm file so that you new subroutine now looks like:

sub new {
        my $class = shift;
        my $self = {};
        print STDOUT ref($self), "\n";
        bless $self, $class;
        print STDOUT "Object ", $self, " is of class ", ref($self), "\
+n";
        return $self;
}
[download]

Run the script again, and you'll see:

HASH
Object Quote=HASH(0x804b514) is of class Quote
Baz
Foo
Is not approved
[download]

Examining the sequence here, we can see that bless is somehow taking our $self hashref (whose nature we confirmed by printing out ref($self) and seeing HASH) and changing it to an object whose type is Quote (which we confirmed by printing out ref($slef) right after calling bless).

So, it seems that blessing is the means by which a hash reference (or any other type of reference) is promoted to an object of a particular class.

Methods

The next thing to look turn back to is our subroutines in Quote. Why the my $self = shift;.

First off, let's now introduce the right terminology -- a script has subroutines, an object or class has methods. There's no other real difference between the two -- a method is simply a subroutine associated with a particular class.

It also turns out that part of way objects work in Perl is that the first argument to a method is always an object reference. If you look back to our very first code we were doing much the same thing, except that there we had to be explicit about handing in the hash ref as the first argument of the subroutine. With objects, it's much the same, except that the reference is simply understood.

Try this:

sub set_phrase {
        my $self = shift;
        print STDOUT "Object ", $self, " is of class ", ref($self), "\
+n";
        my $phrase = shift;
        $self->{phrase} = $phrase;
}
[download]

You'll see the following:

Object Quote=HASH(0x804b514) is of class Quote
[download]

Notice that the reference we have here is to the exact same hash ref that we created in our new subroutine right down to the memory address.

So it follows that: a class' methods are really just subroutines whose first parameter is always a reference to the object of which they are a part.

And there is a second, important consequence of this: calling one object's methods shouldn't return another object's data.

use strict;
use Quote;

my $phrase = "Baz";
my $author = "Foo";

my $quote1 = Quote->new();
my $quote2 = Quote->new();

$quote1->set_phrase($phrase);
$quote1->set_author($author);
$quote2->set_phrase($phrase);
$quote2->set_author($author);

$quote2->set_phrase("Some other phrase");
$quote1->set_author("Some other author");

print STDOUT $quote1->get_author(), " wrote ", $quote1->get_phrase(), 
+"\n";
print STDOUT $quote2->get_author(), " wrote ", $quote2->get_phrase(), 
+"\n";

exit 0;
[download]

It will produce:

Some other author wrote Baz
Foo wrote Some other phrase
[download]

So what does it all mean?

You have now created and used several instances of the Quote class, but through all of this I'm sure that the benefits of "going OO" haven't been very obvious -- you've done a lot of work just to get to a point that would have taken you five minutes in a regular procedural script.

There are several answers:

We have increased the re-usability of our code. Our Quote class can now be used in several places simply by calling use Quote and then creating our Quote object the right way (by calling Quote->new() and then setting all of the attributes (the values contained in $self).
We have started to hide the essence of a Quote from the script(s) that call and use them. Our script no longer needs to know where or how a Quote is created, or what makes one valid, it just knows that it calls get_phrase and gets back a String that goes right... here.
We can start to change the way a Quote works without having to change any of the calling code. Well, as long as we don't start changing our method names, which is why it's a good thing to pick informative method names.

Once again, let's turn to a concrete example: say that you have decided a couple of things about how quotes will work:

Quotes will not be accepted if they are more than 155 characters in length
Quotes will not be accepted if they have no author

Let's take a look at how our Quote class might change (our class is getting long, so I'm just going to show the methods that changed or are brand new):

sub set_phrase {
        my $self = shift;
        my $phrase = shift;
        if (length($phrase) > 155) {
                $self->{invalid} = 1;

                $self->{invalid_message} = "Quote is more than 155 cha
+racters";
        }
        $self->{phrase} = $phrase;
}

sub set_author {
        my $self = shift;
        my $author = shift;
        unless ($author) {
                $self->{invalid} = 1;
                $self->{invalid_message} = "You must specify an author
+";
        }
        $self->{author} = $author;
}

sub is_valid {
        my $self = shift;
        return ! $self->{invalid};
}

sub get_invalid_message {
        my $self = shift;
        return $self->{invalid_message};
}
[download]

Notice that we are doing several useful things:

We allow the user to input whatever they like, but if we don't like it then we will set a flag to tell us that this input isn't ok. Updated: please see the Style section at the end of this document for a discussion of errors and objects.
We create a method that allows us to retrieve the value of this flag
We create a method that tells us why the flag was set

Now think about how your script can take advantage of these features:

You know that a quote is always validated because the simple act of setting an attribute's value (an object's data are its attributes, so the phrase is an attribute, and the author is an attribute of this object) causes the validation code to run.
And more importantly, if you decided to change the way a quote is validated, all scripts that use your Quote class will automatically use the updated Quote class with its updated validation techniques.
And perhaps most importantly, assuming that you have at least three scripts (one to store a quote, one to retrieve it and show it on the Web site, and one to display an error back to the user if it is invalid) then none of these needs to know anything about how the Quote works, they still just know to call get_phrase, get_author, and possibly is_valid.

Change the subroutines again as follows:

sub set_phrase {
        my $self = shift;
        my $phrase = shift;
        if (length($phrase) > 255) {
                $self->{invalid} = 1;
                $self->{invalid_message} = "Quote is more than 255 cha
+racters";
        }
        $self->{phrase} = $phrase;
}

sub set_author {
        my $self = shift;
        my $author = shift;
        unless ($author) {
                $author = "Anonymous";
        }
        $self->{author} = $author;
}
[download]

Our validation rules have changed substantially, but no code changes are required in the scripts -- they just carry on asking our Quote object are you a valid quote?.

Persistence, Inheritance, and Abstract Classes, Oh My

So now let's turn to saving our Quotes and try taking an OO approach here too:

package Saver;

use strict;

sub save {
        my $self = shift;
        return;
}

package Saver::File;

use strict;
use IO::Handle;

@Saver::File::ISA = qw(Saver);

sub new {
        my $class = shift;
        my $file  = shift;
        my $self  = {
                file    => $file,
        };
        return bless $self, $class;
}

sub _open {
        my $self = shift;
        open (OUT, ">" . $self->{file}) or die ("Unable to open file: 
+" . $self->{file});
        $self->{fh} = \*OUT;
}

sub save {
        my $self = shift;

        my $line = shift;
        unless ($self->{fh}) {
                &_open($self);
        }
        print {$self->{fh}} $line . "\n";
}

sub DESTROY {
        my $self = shift;
        print STDOUT "Closing file\n";
        &_close($self);
        return;
}

1;
[download]

A lot of new concepts are being thrown at you here, and I'll address them in a moment below, but I first wanted to point out that I have created a file with two packages in it: Saver, and Saver::File.

The script has now been updated to read:

use strict;
use Quote;
use Saver;

my $file   = "temp_file.txt";
my $phrase = "Baz";
my $author = "Foo";

my $saver  = Saver::File->new($file);
my $quote1 = Quote->new();
my $quote2 = Quote->new();

$quote1->set_phrase($phrase);
$quote1->set_author($author);
$quote2->set_phrase($phrase);
$quote2->set_author($author);

$quote2->set_phrase("Some other phrase");
$quote1->set_author("Some other author");

$saver->save($quote2->get_author() . " wrote " . $quote2->get_phrase()
+);
$saver->save($quote1->get_author() . " wrote " . $quote1->get_phrase()
+);

print STDOUT "Exiting\n";
exit 0;
[download]

Here are the key concepts:

Inheritance is a very important concept for classes -- in this case the Saver::File class inherits a save method from the Saver class.
We have some methods that start with an underscore "_". This is a Perl convention (since there's no other way to specify it) to mark a method as 'private'. Private methods are methods that should not be called by anyone/anything outside of the class itself (not even by inheriting classes). Notice that I never call _open or _close from the script, but they are freely used within the Saver::File class.
Notice that the Saver class doesn't do very much, you can't even create a new class of type Saver since it doesn't have a new method. This indicates that Saver is an abstract class, but more on this in a moment.
The DESTROY subroutine is special (there's another like it that I'm not going to cover called AUTOLOAD) -- in this case, the DESTROY method is called when the object it is associated with is destroyed by Perl's garbage collection mechanism. Again, more on this in a moment.

Here's the output of the script:

Exiting
Closing file
[download]

Interesting, no? The file is closed 'after' the script exits. This is the DESTROY method at work -- Perl waits until it's sure that I'm not going to do anything else with my Saver object and then automatically calls $object->DESTROY(). If I hadn't specified my own DESTROY method, Perl would just have attempted to destroy the references and reclaim any spare memory, but my special DESTROY method tells Perl to close the file handle before allowing the object to be destroyed.

This raises an important point about object-oriented code -- always keep in mind that objects are hard to destroy since Perl often has trouble knowing when you're done using them. If you're in the habit of keeping a lot of references lying around on the assumption that they are just pointers and use very little memory, you're going to find your OO Perl slowly eating its way through your system's memory.

Aside from good reference hygiene, another way to manage this potential memory usage is to create re-usable objects by giving them, for instance, a reset() method that resets the state of the object and makes it ready for a new Quote/File/what have you.

This technique isn't applicable in every situation, but where it can be particularly useful is where you have the potential for a lot of objects to be created (which is a comparatively expensive operation) and speed is of paramount importance. Drawing on my own job experience, I do a lot of ETL (Extract, Transform, and Load) work where I routinely handle plain-text files with over 2.5 million unique records. At this size, it not only becomes prohibitive from a memory standpoint to keep so many objects hanging around, but the overhead needed to create and destroy an object for each record (or each field within a record) becomes astronomical. Instead, we create a single object and constantly re-use it via a method that essentially resets the object to a pristine state. This is much faster and saves a lot of memory.

The next important concept in the code above is the idea of inheritance -- Saver::File inherits from Saver. Inheritance is specified using the @ISA array, so the line: @Saver::File::ISA = qw(Saver); tells us that Saver::File is a Saver (ooooh, a Perl mnemonic!). But what does inheritance do?

In the example above, inheritance doesn't give us a great deal (although I'll touch on some interesting side effects in the section on abstract classes), but it does mean that if we were to give Saver some useful methods then Saver::File would automatically inherit them. Let's try an example by adding the following to the Saver package:

sub serialize {
  my $self = shift;
  my $quote = shift;
  my $text = '';
  $text .= $quote->get_author();
  $text .= " wrote ";
  $text .= $quote->get_phrase();
  return $text;
}
[download]

Then, in Saver::File we're going to change our save method to read:

sub save {
        my $self = shift;
        my $quote = shift;

        unless ($self->{fh}) {
                $self->_open();
        }
        print {$self->{fh}} $self->serialize($quote) . "\n";
}
[download]

And our script would be changed to:

$saver->save($quote2);
$saver->save($quote1);
[download]

Inheritance is what allows me to call $object->serialize() in the Saver::File class (where no such method exists) and have it run the appropriate method in the Saver super-class (where it does).

Please note that what I have done is actually bad OO technique for the following reason: the Saver class now needs to know about the methods of the Quote class in order to do its job. If I were to change get_phrase to be get_quote in the Quote class, then I'd have also broken the Saver class as well. What I did was for illustrative purposes only.

The other important thing that I've done is to create what's called an abstract class -- a class that is never supposed to be directly used. Notice that you can't write my $saver = new Saver; but you can write my $saver = Saver::File->new($file);.

The concept of the abstract class is quite advanced, but it's very powerful. Essentially I am creating a class that exposes (allows the user to call) one or more key methods, but only as placeholders. The methods of the abstract class don't actually do anything useful. If you're wondering why on earth anyone would create a class that has methods that don't work, then you're probably not alone.

The abstract class is intimately bound up with the ideas of casting and of inheritance (which is why I talked about inheritance first). To use an analogy, when you cast a sculpture in metal, you take an object in one form (say a bronze block) and pour it into a caste so that it assumes another shape. The bronze is still the same, but it certainly looks different.

In a similar way, I can cast a Saver::File object as a Saver object because Saver::File inherits from Saver. What this means is that at the same time as my Saver::File object retains all of the functionality of the Saver::File class, it can also be used anywhere that a Saver object is called for. In other words, it looks, to any script that wants a Saver object, exactly like a Saver object. So if I don't care about where something is being saved, then I can just call $object->save() and let the class worry about the details.

This is (sort of) the idea of encapsulation -- if my script doesn't need to know how something is saved, then that functionality (and any underlying data related to that functionality) should be hidden from the outside world (encapsulated).

In fact, the idea of encapsulation says a lot more, and it states that I should never try to access another object's data directly but should always use the methods provided. Since the objects that we are using are really just anonymous hashes, I could very easily reach in and say $quote->{phrase} = "Foo" rather than writing $quote->set_phrase("Foo"). However, not only is this bad manners, but it's also bad OO since not only are you making an assumption about how I've written my object, you are also bypassing all of the validation rules that I wrote into my class.

Anyway, back to the purpose of our abstract class -- let's see what would happen if we decided to move the Quote file into a database (so that we don't have to read in the entire file every time). I could create my Database saver module like so:

package Saver::Database;

use strict;
use DBI;

@Saver::Database::ISA = qw(Saver);

our $sql = "INSERT INTO quotes VALUES (0, ?)";

sub new {
        my $class = shift;
        my $db    = shift;
        my $user  = shift;
        my $pass  = shift;
        my $self  = {
                db      => $db,
                user    => $user,
                pass    => $pass,
        };
        return bless $self, $class;
}

sub _open {
        my $self = shift;
        my $dbh = DBI->connect("DBI:mysql:" . $self->{db} . ";host=loc
+alhost", $self->{user}, $self->{pass}) or die ("Couldn't connect to d
+atabase: " . $DBI::errstr);
        $self->{dbh} = $dbh;
}

sub _close {
        my $self = shift;
        $self->{dbh}->disconnect() or die ("Couldn't disconnect from d
+atabase: " . $self->{dbh}->errstr);
}

sub save {
        my $self = shift;
        my $quote = shift;
        unless ($self->{dbh}) {
                &_open();
        }
        my $sth = $self->{dbh}->prepare($Saver::Database::sql);
        $sth->execute($self->serialize($quote));
        $sth->finish();
}

sub DESTROY {
        my $self = shift;
        &_close();
        return;
}
[download]

Now there is some horrible DBI code in there (I don't have a db installed on the machine that I'm writing this on, and my DBI is very rusty), but that's not what I'm trying to get at. The key thing to gain from this additional class is the following: in order to switch my code from using a flat file to using a database I have to change exactly one line. This:

my $saver  = Saver::File->new($file);
[download]

becomes this:

my $saver  = Saver::Database->new($db, $user, $pass);
[download]

And the rest of my code rolls on exactly as it did before even though I have completely changed the underlying architecture. That is quite a powerful technique to add to your toolset and is one of the main reasons that developers get really excited when they start talking about being able to make something object-oriented.

Notice too that Saver::Database can also use the serialize method defined in the Saver super-class.

Another little tidbit that I'd like to point out is the use of what's called a class variable. The variable $sql (defined by the line our $sql = "...") is shared by every object of class Saver::Database. This means that a change made to the value of $sql in one object would be visible to every other object.

Class variables are often used to define things that are either immutable (will not normally be changed by objects -- as is the case with our SQL statement) or that need to be used as counters (how many objects of class "Foo" have I created?). There are other uses for class variables as well, but those are the simplest.

A Brief Discussion of Style

Several astute people have picked up on my obvious Java pedigree -- I learned OO from Perl, but spent a couple of years as a Java developer where I picked up some rather non-Perlish habits rooted in a strongly-typed language. For the sake of comprehensiveness, I'd like to cover a couple of points where my style differs significantly from that of your standard Perl OO developer.

Java methods are structured so that you can't call a method and decide, pretty much on the fly, what it will do. For this reason I got into the habit of using what are known as getter and setter methods (get_phrase, set_author, and so on). A lot of Perl programmers prefer to write their simple attribute accessor methods the following way:

sub phrase {
  my $self = shift;
  @_ ? $self->{phrase} = shift : $self->{phrase};
}
[download]

This is quite tricky coding-wise, and I'll do my best to make it intelligible since it needs some unpacking... First, we remove the object reference from the subroutine's @_ array using shift. Anything left is assumed to be part of the arguments being passed to the subroutine/method. Next, we test @_ by implicitly calling it in a scalar context -- if it's empty, it will return 0 (false), if it has one or more elements it will return > 0 (true).

In this simple case, we are assuming that there is only a single element remaining in @_, so if @_ ? returns true, we shift the remaining element out of @_ and assign it to $self->{phrase}. If there is nothing in @_, we just call $self->{phrase}. This has the fortunately side-effect of making the return value from our subroutine the value of $self->{phrase} and this is how $object->phrase() can act as both a getter and a setter method.

This is obviously quite a nice little shortcut, but IMO it has a couple of problems:

It is much more difficult for mere mortals to read and one should neither under- nor over-estimate the intelligence of those who have to pick up your work after you.
It is easy for someone to inadvertently introduce a subtle but serious bug by adding code after our very clever ternary operator. And although an explicit return could help fix this, it remains a potential source of confusion.
It also makes for less intelligible method calls -- there is nothing ambiguous about calling $object->set_phrase($phrase), but there is a lot more room for misunderstandings in calling $object->phrase($phrase)

But this is my personal opinion, and only you can decide whether you prefer the faster, more Perl-like style, or the more rigorous, less ambiguous Java-like style. And of coures, if you were feeling particularly generous you could offer the programmer both styles and then let them chose the style with which they are most comfortable. TMTOWTDI.

Another Java habit that I've picked up (and this is again related to the getter/setter methodology) is my general avoidance of hashes and arrays as ways of setting multiple parameters simultaneously. I tend to feel that if you are setting two attributes in a method then you should do so explicitly with two scalars. However, there's absolutely nothing stopping you from creating a method that follows the following style:

$quote->Set(phrase=>$phrase,author=>$author);
[download]

Again, it's perfectly valid Perl and many people like to write things this way.

And perhaps more seriously, several people have suggested that my reluctance to use die() is just plain wrong (see Bad Lessons section below). They are, for the most part, right.

When I last did a lot of Perl OO work (this piece is something of a refresher for me) I wasn't very comfortable with the following syntax:

eval {
  $object->method($foo)
}

if (@$) {
  ... handle error ...
}
[download]

I'm still not very comfortable with it, but at least I understand it now (of course, I may have got the syntax completely wrong as I'm still not familiar with it).

I have one tiny quibble with this approach -- eval{} lacks a degree of granularity that would be very helpful in distinguishing between errors. This method forced me to check what's in @$ and then decide what to do, rather than working from the assumption that if something died, then it was probably for a good reason. Again, this is nothing more than a matter of style, and if it's commonplace for objects to die() and for those errors to be trapped by eval{} then I'd suggest going with the common way rather than my idiosyncratic way (I'm in a 12-step program now). An alternate system, proposed by gjb uses the Exception and Class::Exception modules. I haven't worked with these modules, but based on their names alone I'd guess that they offer a promising, more Java-like system for handling errors with the desired degree of granularity.

Some Bad Lessons

It's also important to note that there are a number of bad lessons taught both by my code and by some people's usage of OO in Perl, here are a few of them:

I used to believe that it was bad form for an object to die(). The main reason for this was the impossibility of knowing in advance every context in which a class might be used. You might think that being unable to establish a database connection is worth dieing for, but what if the calling application was written so as to try to use a database first and then fall back on a plain file if it couldn't establish a connection. However, as the strategies outlined above make clear, there are more direct ways of alerting the user to an error condition than setting a flag and hoping that the developer remembers to check it. This updated was based on contributions from demerphq and gjb
As I said above, my use of the DBI module here is just terrible. I'd like to believe that if I were writing a real application I'd do much, much better with this class.
You can always offer default values as part of your instantiation process -- although none of my classes offer them (except in the Saver::Database class where a $sql value is defined), there's nothing stopping you from making some sensible guesses to help the user under the right circumstances.
Avoid multiple inheritance -- this is one of the bad things that Perl inherited from C++ and that Java, IMO, quite rightly did away with. While there's nothing keeping you from adding fifteen different classes to your @ISA array, you'll not only confuse the heck out of anyone trying to use your class, you'll probably also make your own debugging much more difficult. You could inherit simultaneously from DBI and File::Find, but before you do so you really need to ask yourself why you could possibly need to do this.
As with all things Perl, there are always exceptions to the rule, and the exception here is something called an interface. In Perl there is no real difference between an abstract class and its methods and an interface. Take the example of things that switches lights on and off -- say, a regular switch, a dimmer, and The Clapper (tm). Each of these offers the ability to say $light->on() and $light->off(), but there's no way that our class Clapper should be inheriting from the LightSwitch class since it's a completely different device that just happens to be able to turn lights on and off. Instead, we would define an interface called LightSwitcher with two methods (on() and off()) that do nothing, but that all three classes can inherit from in their @ISA array. Each class would have a totally different implementation of on and off, but they can all be used as LightSwitcher objects by someone's Perl code if the need arises. If you want to know more about interfaces and how to use them then do a little reading about Java to get the idea
Avoid calling methods on other people's classes that are marked private via the '_' prefix. The whole point of private methods is that the developer doesn't want you using them and it's probably for a good reason. I might decide to do away with the _open and _close methods on my Saver::File class, but if you've come to rely on them in your script you're going to have to do a lot of debugging. I believe that the Camel book said it best (I paraphrase): Perl doesn't keep you out of the living room with a shotgun, it asks you politely. Private methods are like the living room of house, don't go there unless you're invited.
Avoid calling private methods as methods. This may sound strange, but you will notice that in my Saver::File and Saver::Database classes I call _open and _close using the normal subroutine syntax and not using the special method syntax. This strategy avoids a very subtle error whereby calling Saver::File::_open() in Saver::File might inadvertently actually call Saver::File::CSV::_open() (in the event that you were using a CSV object but working with the super class' methods for some reason). This is a very difficult issue to properly convey in an introduction to OO, and if you don't understand it now, I am confident that you will shortly. Again, there may be situations where the behaviour that I have just described is actually desirable, but you should know when to break the rules before doing so. Update: thanks to Abigail-II for pointing this error out in my original code
Since hash refs are the easiest data types for the novice to understand when creating and using classes and objects this is what I have used, however please note that you can use more than hash references to create objects (although these are often the easiest). In fact any reference can be turned into an object and there are often good reasons for creating classes that use an underlying array reference, a scalar reference, or even a closure! This, however, is a topic for a more advanced tutorial.

Credits

This tutorial now directly incorporates helpful feedback from demerphq, gjb, and Abigail-II, and implicitly incorporates feedback from a number of other people as well.

Changelog

Dec 10, 2002 -- changed method calls from getPhrase to get_phrase to hide my Java background and address demerphq's observation that this style is often confusing to non-English speakers. Also added section on OO style and tried to address issue of how objects die().

Dec 11, 2002 -- incorporated feedback from Abigail-II regarding use strict and the better way to call private methods.

That's it for now... I'm sure that I will be updating this again with more feedback from those with greater experience/knowledge/expertise in fairly short order.

Back to Meditations