http://qs321.pair.com?node_id=898755

tilly has asked for the wisdom of the Perl Monks concerning the following question:

For work I'm looking at implementing something on top of a NoSQL platform. I'd like to find or create something similar to an ORM (except without the relational part) so that we can easily create our own structure.

If it matters, it looks like we'll use Riak for the NoSQL store and connect to it with Net::Riak. That decision is not final. Our requirements is that it should be easy to set up, be memory efficient, and transparently replicate across multiple nodes without having any primary master. (Those requirements exclude common relational databases.)

Replies are listed 'Best First'.
Re: Any NoSQL equivalents of an ORM?
by danb (Friar) on Apr 11, 2011 at 23:14 UTC
    Here's the ORM that I recommend for typical NoSQL platforms:
    %keys_vals = ($keys, $vals);
    I keed, I keed. I know you're not using just the key value store subset of NoSQL functionality. ;)

    --Dan

      Heh. Very much not. My current "design braindump" includes the following features:
      • Create and maintain schemas for complex objects.
      • Maintain bidirectional object relationships. (Think master-child relationships - from the master you should be able to find the children, and each child needs to know its master. This should be automatically maintained.)
      • Ability to dump networks of related objects.
      • Ability to load them elsewhere.
      • A conflict resolution algorithm in case two different clients updated an object at the same time without seeing what the other was doing.
      In short I'm really tackling the sort of problems that an ORM on top of a relational database makes easy.

        What order of scale are you hoping for?

        Mechanisms that will work well for say 4 to 16 nodes will often fail hopelessly if you try to scale them to 100 or 1000 nodes. Conversely, algorithms that will scale to 1000 nodes will usually be relatively inefficient if used for only 4 or 8 nodes.

        A conflict resolution algorithm in case two different clients updated an object at the same time without seeing what the other was doing.

        In general, it is far better to avoid this possibility than to design algorithms to handle it. Synchronisation always imposes high overheads on all operations. Even read(only) ones.

        The best approach to distributed data management--assuming your application can be made to fit--is to distribute your objects across the nodes, but only allow the owning node to manipulate the object. Ie. route all operations on an object to its owning node. (Or nodes for failover; but only to secondaries if the primary fails.)

        A quick browse of Riak link provided shows that it does this for you at the physical data (disk) level, but you will still need to provide a similar mechanism, perhaps based upon the underlying 160-bit space, at the application logic level.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Any NoSQL equivalents of an ORM?
by InfiniteSilence (Curate) on Apr 12, 2011 at 14:07 UTC

    I'm with BrowserUK on this one. Unless you can specify how big your problem is space/processing wise you cannot make a definite recommendation as to what type of solution to use. That would be as irresponsible as recommending some kind of flat-file tool for 1K users planning to perform read/writes in real time.

    Those requirements exclude common relational databases...

    I cannot say that this statement is definately true. There is multi-master replication available for PostgreSQL clusters up to 128 nodes. I'm fairly certain that other clustering relational databases have multi-master models nowadays as well.

    Also, after reading a bit more about how this Raik tool stores its data it sounds an awful lot like you might consider Oracle's Berkeley DB which appears to already have a Perl interface written for it.

    Celebrate Intellectual Diversity

      Those are both good suggestions. However they both are at the CA corner of the CAP theorem, and for our use we care more about the AP corner.
        Take a look on Kundera: https://github.com/impetus-opensource/Kundera ORM solution over cassandra, hbase and mongoDB.