Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re^2: Any NoSQL equivalents of an ORM?

by tilly (Archbishop)
on Apr 12, 2011 at 06:09 UTC ( [id://898846]=note: print w/replies, xml ) Need Help??


in reply to Re: Any NoSQL equivalents of an ORM?
in thread Any NoSQL equivalents of an ORM?

Heh. Very much not. My current "design braindump" includes the following features:
  • Create and maintain schemas for complex objects.
  • Maintain bidirectional object relationships. (Think master-child relationships - from the master you should be able to find the children, and each child needs to know its master. This should be automatically maintained.)
  • Ability to dump networks of related objects.
  • Ability to load them elsewhere.
  • A conflict resolution algorithm in case two different clients updated an object at the same time without seeing what the other was doing.
In short I'm really tackling the sort of problems that an ORM on top of a relational database makes easy.

Replies are listed 'Best First'.
Re^3: Any NoSQL equivalents of an ORM?
by BrowserUk (Patriarch) on Apr 12, 2011 at 13:15 UTC

    What order of scale are you hoping for?

    Mechanisms that will work well for say 4 to 16 nodes will often fail hopelessly if you try to scale them to 100 or 1000 nodes. Conversely, algorithms that will scale to 1000 nodes will usually be relatively inefficient if used for only 4 or 8 nodes.

    A conflict resolution algorithm in case two different clients updated an object at the same time without seeing what the other was doing.

    In general, it is far better to avoid this possibility than to design algorithms to handle it. Synchronisation always imposes high overheads on all operations. Even read(only) ones.

    The best approach to distributed data management--assuming your application can be made to fit--is to distribute your objects across the nodes, but only allow the owning node to manipulate the object. Ie. route all operations on an object to its owning node. (Or nodes for failover; but only to secondaries if the primary fails.)

    A quick browse of Riak link provided shows that it does this for you at the physical data (disk) level, but you will still need to provide a similar mechanism, perhaps based upon the underlying 160-bit space, at the application logic level.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      What order of scale are you hoping for?

      Dozens of peer nodes, with peak performance of at most dozens of writes per second per node. (Usually it will be quieter than that. The nodes will mostly be used for other stuff, Riak should be running in the background.) Performance and throughput are not bottlenecks here - one machine can easily do that. The issue is availability, and the desire to avoid having another specialized machine per cluster.

      The best approach to distributed data management...

      Sorry, there is no best approach. The CAP theorem says that you can choose any two of Consistency, Availability, and Partition Tolerance. Depending on your application, it may be appropriate to wind up be at any corner.

      Riak is at the AP corner. That is appropriate for what I am trying to build. We expect conflicts to be very rare. Ones that cannot easily be merged should be much, much rarer still. A low remaining error rate would be acceptable. Writes will come from all nodes we are running at. Internal networking problems or localized hardware problems should not limit the ability of other nodes to function as best they can.

      Your suggestions would be appropriate if we were trying to wind up at the CA or CP corners. We're not.

      A quick browse of Riak link provided shows that it does this for you at the physical data (disk) level, but you will still need to provide a similar mechanism, perhaps based upon the underlying 160-bit space, at the application logic level. That is one piece of what it looks like I need to write.

        Riak is at the AP corner. That is appropriate for what I am trying to build.

        Yes. But you added the Consistency requirement when you asked for "A conflict resolution algorithm"

        We expect conflicts to be very rare. Ones that cannot easily be merged should be much, much rarer still. A low remaining error rate would be acceptable.

        If that's all true, you don't need to add conflict resolution. By your own words, they will occur very rarely and if a low error rate is acceptable to you.

        But, if you feel the error rate might be too high without some effort to resolve conflicts, then it is just as easy and just as (in)efficient to fix them all as fix some. Especially as you say that: "Performance and throughput are not bottlenecks here"

        Writes will come from all nodes we are running at. Internal networking problems or localized hardware problems should not limit the ability of other nodes to function as best they can.

        From what I read of Riak, it already provides for fail-over at the hardware by redistribution of the 160-bit hashes around the ring. But it does require functioning nodes to be able to communicate.

        With that in place, the simplest conflict resolution method you could sit atop, is to avoid the conflicts by routing (serialising) all write requests through the appropriate node.

        That leaves two failure modes to be concerned with:

        • A node goes down after a write request has been routed to that node but before it has been acknowledged.

          The requesting node reissues the request after some time limit and it will be re-routed to whichever node has taken over responsibility for that range of the hash space.

        • The network fabric between the requesting node and the serving node goes down.

          If the network fabric connecting the nodes is unreliable, Riak will essentially be stuffed anyway.

        Obviously, I only know the little you've told us, and I can envisage (a few) scenarios where conflict resolution might be better than conflict avoidance. But for most of them, I think you would be fooling yourself to think that Riak will survive, when the ability of the nodes to communicate with each other wouldn't.

        Anyway, good luck. It sounds like you have your work cut out.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://898846]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (5)
As of 2024-03-28 08:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found