http://qs321.pair.com?node_id=292915

Update: For the record, I've polished this module and finally decided to release it. The name I settled on was Class::Tables.

I've had this pet project for a long time (before I knew about such luxuries as Class::DBI in fact). It's a very simple object-persistence interface. After long years of Perl learning here at the Monastery, I've come back to it and refactored it into a state that I'm very proud of. I know I find it unmeasurably useful, but at the risk of being Yet Another POOP Module, I thought I'd ask for some input here first to see if this idea is release-worthy. Also, naming modules is difficult.. The production name is DBIx::SimpleObj (I'm indifferent to this name so far). I haven't posted the code here, but I certainly can if folks are interested.

You're probably asking, "So if it's not Yet Another POOP Module, what makes it special? I'll just use Class::DBI, dork!" Well, I very rarely find that I need the advanced features of Class::DBI like joins, custom attribute inflation/deflation, etc.. For example, I was cooking up a simple database-backed blog for my homepage recently, with only two tables. In the little apps like this where I use object persistence (mostly for the clean interface and to avoid coding SQL), the object persistence should be 100% automatic because I'm lazy and the schemas are so simple.

So here's the goal of this module: Use the module, give it a $dbh, and without any other intervention on your part (that's two lines total), you should have all the appropriate classes for a persistent-object interface to the data, with foreign key inflation and other niceties. Add another table to your DB schema, and without adding extra stuff to your code, you already have a new class for that table the next time your code runs -- automatically. Having to tell the interface about the structure of your data is boring. In lieu of this, the module will use the metadata in the DB (table, column names, etc) to figure out the structure of things. Obviously I can't match the functionality of the big guns like Class::DBI in this kind of a module, but what I can do make the simple things easier. Here are some of the important details about what it does:

One class per table
Upon scanning the tables, the module auto-generates a class for each table. The package name will be in StudlyCaps to fit with Perl conventions, so the DB table foo_bar would correspond to Perl objects of the FooBar package. All tables must have a column named id that is the primary key and set to auto_increment. All other columns get an accessor/mutator method named after the column (the id gets a read-only accessor).

Classes are also provided with a few class methods: search, new, and fetch, an optimized version of search(id => $id).

You can still override methods if you really need to with SUPER (dumb example):

package Employee; sub ssn { my $self = shift; my $ssn = $self->SUPER::ssn; $ssn =~ s/(\d{3})(\d{2})(\d{4})/$1-$2-$3/; return $ssn; }

Auto-detect foreign key references
If a column has the same name as some other table, it is treated as a foreign key. Example: I have two tables, employee and department. The employee table has a column called department, so it is automatically treated as a foreign key: The method call $emp->department will be inflated to a Department object as an accessor (not just return a department ID), and can accept a Department object (or just an ID) as a mutator.

Conversely, the foreign keys work in reverse. I can call $department->employee (read-only) to get a list of Employee objects that reference this department.3 This is just shorthand for Employee->search(department => $department).

You may be thinking, why impose naming restrictions like this? First, to keep our sanity most of us have similar DB naming conventions anyway.2 Also, to support MySQL, I don't want to just look at the foreign key attribute of the columns, which are merely cosmetic in MySQL (and therefore rarely used).

Lazy loading
*BLOB and *TEXT columns are only loaded only when that data is requested, so the most common queries can be a bit faster.

Sort Order
The first column (other than id) in the table's schema is the sort order of all queries on that table. If the employee table has columns id, name, and department, in that order, then all searches returning a list of Employee objects will be sorted by name.

Stringification
If a table has a column named name, its value will be used for an object's stringification value. Otherwise, the stringification will be "classname:id".

Flyweight implementation
..in both senses of the word. Objects are implemented as lightweight references to a scalar containing the object's ID. The ID is used to index instance data in a big lexical hash in the superclass. This way, updates will be automatically shared for all concurrent objects that represent the same tuple in the DB.

Here are some current drawbacks, shortcomings, to-dos:

  1. So far only supports MySQL. If this causes you to not take me seriously, just forget I mentioned it.
  2. It would be nice to allow foreign key columns to have an optional _id at the end, as that seems another commonly-used naming convention (the employee table has a column called department_id).
  3. It would be cool to use some Lingua pluralizer/stemmer to let me say $department->employees (note the plural) for the reversed foreign key accessor.
  4. So far, no transactioning (yet?) -- everything is autocommit.
  5. Obviously it doesn't do complicated stuff like joins, but that's missing the point. Maybe it could do many-to-many relationships that bypass the middle table, but I haven't thought of a good way to it yet.

As I said, I really like using this module for my own projects because writing the table schemas to get them to work with the module is how I would have written them anyway! So the two extra lines I add by using this module and giving it the $dbh feel like having my mind read. But would these rules I'm imposing on the schemas be a stretch for other users? Is this something best left just for my own personal use, or do you see a potential niche for something like this? Thanks a lot, and all input is appreciated.

blokhead

Replies are listed 'Best First'.
Re: Module RFC: Yet another object-persistence interface
by demerphq (Chancellor) on Sep 21, 2003 at 11:46 UTC

    This sounds pretty interesting, and frankly very reminiscent of the project that originally got me interested in perl. I was working on an order entry system in VB with a Sybase database as the back end. I was designing the OE class model as VB COM objects and didnt want to have to do all the DB work by hand as well. Enter Perl, parsing VB class modules and automatically generating a DB, as well as modifying the class model to add DB interaction. I completed my model, but not long after the project was killed (turned out it was just one of like 10 other similar projects in the company, and eventually they were all put on hold so that a better plan could be developed :-) I was happy as it meant I didnt have to work in VB anymore, and it was the last project where I have had to do more the maintenance work on VB. A later project was in pure perl and involved essentially the reverse, using a DB design tool that produced XML we developed a XSLT stylesheet that generated DB aware classes from the model. That one was live for a year, fulfilled its purpose and was shut down when it is was no longer needed. (It was the only interim solution that ive seen that actually was interim :-)

    The reason I say all this is because my conclusions drawn from them is that tools like this rapidly become unsuitable for serious work. They artificially tie your hands to a bunch of presuppositions that sound good at the time, but dont grow well. So im very suspicious of tools and frameworks that work like this. I tend to view them as wasted efforts that while sounding good in theory dont actually work out as useful as they should. Dont get me wrong, Im not saying this is a wasted effort. At least one respondant has replied very positively, and even if only they and you derive benefit from this tool then you've achieved something positive. Im just trying to point out a few things that I observed in the projects I mention.

    With regard to naming conventions, I think a system like this becomes very dependent on the power and expressivness of naming conventions (this assumes that as you said you dont want to use more robust structural discovery techniques). You touch on the problem yourself with the naming convention you mention. Primary keys are represented by the table name, and foreign keys are represented by the table name refered to. Personally I have never designed or used a DB that conformed to either of these assumptions. :-)

    First off, my table names are often long. Second off, how do you represent a table that has two references to another table this way? For instance a DB that represents a family relationships. A "person" table may have two fields "mother" and "father" and then perhaps we may have a second table "guardians" that contains two references, one to the guardian one to the gardee (hmm, not sure the word for that :-), How do we represent either of these structures using your naming convention? How do you represent composite primary keys? How do you represent complex integrity constriants? Now as one respondant already pointed out, we dont need to _always_ use your code, but if its to be really useful I think you need to resolve some of these problems. The double reference one is very common in my experience.

    Naturally I have some ideas on the matter. :-) One thing that we did was use a more complex naming scheme. In our scheme every table had an associated prefix, normally generated from the first letter in each word in the table name, or lacking multiple words for truly base table names some other easily remembers abbreviation (such as cs or cust for customer). Then we used suffixes to represent foreign key or primary key status. So a table customer_relationship that has an identity primary key would have that column called "cr_id_pk". If there is also a type table called "customer_relationship_type" then the reference to it in customer_relationship would be called "cr_crt_id_fk". The automatic tools can then use some rudimentary rules to figure out whats going on.

    In the case I mentioned earlier of the family trees, we might have a table "people" (I chose this now, as it abbrreviates nicely) whose prefix is ppl. Now we might have fields "ppl_id_pk", "ppl_father_ppl_id_fk", "ppl_mother_ppl_id_fk", etc. When the discovery mechanism runs it identifys the fields ending in fk, pulls off the tables own prefix, and then keeps removing words from the front until it matches a primary key it knows about. This approach covers cases where you have composite primary keys, multiple references to the same table, foreign keys that are part of a composite primary key (its not hard to work out a way to handle this using the suffix pk).

    Does all of the above start sounding nasty? It should. And its the reason why I steer clear of stuff like this nowadays. Anyway, I hope you find some of this useful to your efforts, while I am a cynic, Im not totally negative. :-)


    ---
    demerphq

    <Elian> And I do take a kind of perverse pleasure in having an OO assembly language...
      Thanks for your comments, demerphq ;). I wonder if I didn't make my goals very clear. I'm not so out of touch that I really want to build a complete autodetecting replacement for Class::DBI. I want to make the "simple" things automatic, and try to make the definition of "simple" encompass as much as is reasonable. Integrity constraints, composite primary keys -- these are not in my definition of "simple" at the moment, and you are right, a naming-based system like this will start getting nasty (read: unreasonable) if I tried. I don't want the whole kitchen sink.. I want a self-loading and self-emptying dishwasher ;)

      I suppose my target audience would be anyone who wants to quickly write some glue code where the data is to stored in a "simple" DB. Maybe even this is unreasonable by your definition of "serious work", but in my experience I've written a lot of small projects where the database schema really is this simple. If you start out a project and decide your data needs composite keys, integrity constraints, whatever.. by all means don't use this! In the case where an application starts out within the scope of my module but grows to be too complex, I'd like to think that the OO interface being not entirely dissimilar to Class::DBI, a migration wouldn't be totally unreasonable. You give a really good example about projects growing beyond the scope of this type of thing, but I hope that the possibility of that happening doesn't make it a useless project, or scare people from at least trying it.

      One thing you bring up that I really do want to have a plan for is the mother/father relation example. I've run into this when trying to represent a tree in the database (using a parent column). It's a situation that's common enough that I should have a plan for it. The biggest problem is that MySQL not only ingores, but doesn't even store foreign key metadata on a column (Update: Oops: except on InnoDB tables, thanks perrin ;)). With Postgres or some other "real" RDBMS, I could autodetect these with the foreign key info in the metadata. Maybe this will be the thing to finally convince me to switch (although my webhost still has only MySQL). The only problem might be automating a good name for the reverse relation. The relation is "parent" in one direction and "child" in another, which might not be easy to automate (although the idea of using some Lingua modules to do it intelligently is very exciting).

      blokhead

        MySQL supports foreign keys when you use InnoDB tables, and the information is accessible.
Re: Module RFC: Yet another object-persistence interface
by dragonchild (Archbishop) on Sep 21, 2003 at 01:43 UTC
    First off, I think you should definitely post this on CPAN. As for a class name, why not call it what it is - Class::DBI::Psychic? Another name could be Class::DBI::Lite, but that doesn't really describe the true benefits this class provides. From your description, your class does the following:
    • Reads the schema from a database
    • Auto-generates classes from that schema and provides them for you
    • Auto-generates FK relationships, based on the assumption of a well-normalized and well-named schema
    • Allows you to treat the database as a true OO store.

    That sounds like Class::DBI::Psychic to me, but no-one will use that. So, call it Class::DBI::Simple and have done with it. :-)

    And, yes, I want this code. Because, I use your interface for the easy stuff and I can still pass the $dbh some complex SQL, should I need it. It makes the hard (or annoying) things easy and still lets me do the impossible.

    ------
    We are the carpenters and bricklayers of the Information Age.

    The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6

    Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

      It's pretty well established at this point that calling a module Class::DBI::* means it has something to do with the Class::DBI module. Class::DBI is a specific module, not a generic term. Things like "Persistence" or "Object-Relational" are generic terms, but not Class::DBI.
        So, XML::Parser::Lite has to do with XML::Parser? I don't think so ... this module is one that provides a Class::DBI-like interface, but does it in a simple manner. Hence, Class::DBI::Simple.

        Now, I can understand your hesitation because XML::Parser can be dropped in for XML::Parser::Lite with no changes. This module cannot be replaced directly with Class::DBI without a problem. Interesting problem ...

        ------
        We are the carpenters and bricklayers of the Information Age.

        The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6

        Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

Re: Module RFC: Yet another object-persistence interface
by BrowserUk (Patriarch) on Sep 20, 2003 at 23:20 UTC

    Would you say that your module had advantages over things like Tie::DBI or Tie::Table ?

    As for a name. Maybe Class::Tables?


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
    If I understand your problem, I can solve it! Of course, the same can be said for you.

      I would say that simply having an OO interface is a big advantage. The popularity of Class::DBI seems to suggest that people enjoy accessing their relational data as objects.

      There seem to be a few problems with the tied hash approach. For inflating foreign keys and other functions, Tie::Table is forced to mix OO/hashref metaphors (Tie::DBI doesn't seem to support foreign key inflation which really limits the comparison). From its POD:

      %company_14_users = %{ $company->{14}->user }; ## [blokhead: it's not clear whether %company_14_users is magical or + not...] $company->{14}->{tax_num} = "111-22-3333"; $company->{14}->write;
      Its POD goes on to scare/confuse me with this:
      $company_14 = $company->{14}; $company_14->{tax_num} = "123456"; # Wrong example: # $company->{14}->{tax_num} = "123456" # This doesn't work, because it always create a new Row object,
      I've never extensively used tied objects, but I'm wondering if this strange behavior is somehow related to the difficulty of nesting, returning tied objects? It's also not clear whether I can add other non-persistent attributes and methods to these objects. With OO, I can always add subclass data, helper methods, etc.

      The only thing I can see in favor of having objects as hashes is for quicker glueing with HTML::Template. Of course, I could always add a ->to_hash method to my objects as well ;)

      Also, one of my key reasons for doing this module is so that I never have to tell some interface about how my tables are related. With those modules, you have to set up a tied hash for each table, telling it about your primary key and foreign keys. Obviously you could make a wrapper to scan your database and build the appropriate tied hashes, but if my module can do it out of the box, why not? I know there's nothing earth-shattering about this idea, but it's saved me a lot of time along the way.

      blokhead

        When I first read your description, I really liked the simplicity of what you described. Then I got to thinking about tables, and what you might want to do with them and it struck me that hashes seem like a fairly natural interface to them. Looking under cpan://Tie::*, I found the modules I listed and I scanned the pods and decided I couldn't tell from that whether they had everything your had or not, so I thought I would ask, and get the skinny straight from the horses mouth:)

        I really like the OO-metaphore for accessing external data. I'm less concerned by the purely syntactic differences between the two interfaces than you are, both have a sufficiently OO-style for my tastes, and in many respects I prefer the somewhat simpler, more direct lvalue syntax of accessing the data. I don't see any reason why a tied interface couldn't make connections between tables automatically following a similar hueristic to that your module uses. It ought to also be possible to have the tied interface handle inflation properly, but that is speculation, and in any case, it would require modifying or subclassing those modules, and your already does it.

        I've encountered, the 'problem' with nested ties before, and whilst I'm not positive that it can't be solved, I haven't done so myself, and I've seen it discussed other places, so maybe it cannot be (adequately) solved.

        Your argument about tie requiring me to mix the interfaces for anything that goes beyond simple accesses to the data is a convincing one. That to used the mixed interface requires me to hold two handles to each object strengthens it.

        I was going to ask if your module allowed me to "refresh" the interface mid-stream so that if a view was updated as a result of a triggered, stored procedure, the object you had built for that view would reflect the change--in which case, I think I would be totally sold on it--but MySQL doesn't support these features, so that's an academic argument.

        As it is, I like the sound of your module. Anything that keeps SQL out of applications is a good idea in my book, and for the purposes you devised it, it sounds perfectly simple and useful.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller
        If I understand your problem, I can solve it! Of course, the same can be said for you.

Re: Module RFC: Yet another object-persistence interface
by Anonymous Monk on Sep 21, 2003 at 09:14 UTC

    Could you emphasize the differences between the features of your proposed module and the ones already available in Class::DBI::Loader?

    Thanks

      I knew once upon a time I saw this, but have never been able to relocate it again, so thanks for mentioning it. Now I can finally see what it does! It gets a list of tables from the DB and then calls the set_up_table method from the driver-specific parts of Class::DBI. Browsing the source of these, none of them seem to do much more than make a list of columns and figure out the primary key. They don't setup lazy loading or any foreign keys, but I think I know why. This approach doesn't seem to match the philosophy of Class::DBI -- it wants you to be able to name a foreign key relation however you like, and lazy load columns however you like (and as the all-purpose tool out there, it should). My module has a much lazier usage philosophy. It's willing to reduce your flexibility and give you some (I think) very reasonable defaults in order to get you up and running with the easy stuff in less time.

      blokhead

Re: Module RFC: Yet another object-persistence interface
by particle (Vicar) on Sep 21, 2003 at 14:33 UTC
    Having to tell the interface about the structure of your data is boring.

    i agree. and i like that approach. but unless this works with Oracle, i won't be using it. i would suggest that before you post to CPAN, you refactor to make the user interface seperate from the backend interface -- modularize the MySQL specific stuff, and allow other backends to be written. then i can write my own Oracle interface, and i might take a second look at your implementation.

    ~Particle *accelerates*

Re: Module RFC: Yet another object-persistence interface
by perrin (Chancellor) on Sep 21, 2003 at 18:30 UTC
    It seems to me that as a persistence module Class::DBI has many more features than yours does. The main thing that makes your module interesting is the automatic setup. There are two approaches to that with Class::DBI right now, one being the use of Class::DBI::Loader (and setup_table calls), and the other being code generation with SQL::Translator. I think the most useful thing would be to fold some of what you did into one of these. Perhaps you could add things to Class::DBI::mysql, like a "guess_foreign_keys" method. At the moment, I'm using Class::DBI::mysql and it does a good job of setting up everything else, making the persistence section of my classes only a couple of lines long.

    By the way, how do you clean up the hash that hold all the data in the superclass? Does it just grow forever, or do you have some way of doing reference counts to clean it up when objects go out of scope?

      If I'm reading blokhead's description, his module also sets up the classes for you. Right now, you have a bunch of classes that are like 10 lines long. His removes that need by making assumptions about what those classes will generally look like. To me, that's a huge feature.

      ------
      We are the carpenters and bricklayers of the Information Age.

      The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6

      Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

        Class::DBI::Loader does the same thing, and SQL::Translator does the same but with actual files.