Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Caching DB rows transparently and with SQL search

by bliako (Monsignor)
on Jul 04, 2020 at 11:13 UTC ( [id://11118905]=perlquestion: print w/replies, xml ) Need Help??

bliako has asked for the wisdom of the Perl Monks concerning the following question:

Esteemed Monks, I need some advice on DB caching,

A typical web-app, let's say using Mojolicious and DBIx::Class, has a lot of semi-static data stored in the DB. For example the list of registered users, the set of allowed tags, phone numbers of associates. This data does not change often (perhaps once a week/month) so why do I fetch it every time a user looks up another user's phone. (forget at the moment issues of privacy - all users are allowed to see all other users details). Another example is a business phone directory with 100 entries. Fairly static and very much in use. This data is not so big in size so I want to keep it in the memory of my app and serve this instead of fetching it from db each time a webpage with a dropdown menu of all phones is loaded.

So, I thought on starting up the app, create a helper and store these semi-static data (fetched once and only from DB). Then serve this and never bother the DB again. That's fine but how about providing SQL-style search for this data? Because sometimes I will get requests from the client (e.g. when it loads a user's profile page) of /user/list?id=xyz in addition to /usr/list?id=all and also /usr/list?name=*abc*. These are all implemented as SQL searches in DBIx::Class and work fine but they are not cached.

This is what I would like (and this is the 21st century and I want to have it): either tell DBIx::Class that specific table should be cached unconditionally until I say so. This transparency will be ideal for using existing code. OR, less preferred, store the static data in Perl and via a module have SQL searches on it, as if it was residing in a DB.

thanks, bw bliako

EDIT: I have asked a similar question at https://webchat.freenode.net/#mojo (which was part of a more general question)

Replies are listed 'Best First'.
Re: Caching DB rows transparently and with SQL search
by 1nickt (Canon) on Jul 04, 2020 at 14:14 UTC

    Hello brother bliako,

    At $work we extract static data from the DB at startup and load it into a hash in memcache, and retrieve it using key paths. Being able to query the data using SQL would seem to be just an implementation detail.

    Possibly of interest:

    Hope this helps!


    The way forward always starts with a minimal test.

      Hello again brother 1nickt,

      Thanks for the pointers. DBIx::Class::Cursor::Cached looks the most transparent to me if I don't want to have duplicate code to access my actual db and separate cache/memdb. I will most likely experiment with this tomorrow and adopt it.

      Being able to query the data using SQL would seem to be just an implementation detail.

      ... an "implementation detail" the size of Everest Mountain I would say :) ! But anyway, since I use DBIx::Class I rarely have to use raw SQL, so the bit in my post about SQL could be ignored.

      However, the need to access cached data by key remains if I follow the non-transparent, cache route. LanX suggested ( Re: Caching DB rows transparently and with SQL search ) memoizing the db calls, which for me are typically (edit: just a bit 1' after)

      my $conditions = { registered => 1 }; my $atrtributes = {columns => ['email','name']}; my $registered_users_rs = $schema->resultset('User')->search($conditio +ns, $attributes);

      or at a higher level my $users = Model::User::list($conditions, $attributes); . So, right now I am experimenting with converting arbitrary function parameters (e.g. ($conditions, $attributes) both being hashrefs) to a cache key. In short, an md5 for a (nested) hashtable! But I will ask this in a separate post. At the moment, studying Memoize shows that this problem has partly been solved but still requires, as I understand it, manual tweaking of the normalizer. All the best and thanks for the pointers.

Re: Caching DB rows transparently and with SQL search
by LanX (Saint) on Jul 04, 2020 at 11:56 UTC
    I suppose encapsulation of the desired queries in subs with memoizing isn't good enough because of the sheer amount of redundant data for different queries?

    I found this very old SO discussion:

    Does DBIx::Class have transparent caching

    Hopefully it's of help. :)

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      Yes, because at the moment I supply a hash of conditions and attributes for the DB search. You can't memoize with a nested hash, as input, effectively can you? I mean you can compare deeply but is that more efficient than hitting the DB?

      At a lower level (say, subclassing DBIx::Class's search as the link you cited suggests), it should be able to distinguish equivalent SQL even if they seem different (order, temp table names).

      But I now see clearer that it helps to constraint the searches and abstracting them before they are turned into SQL, before they even hit DBIx::Class will be much more efficient but creates a parallel universe of code in the app.

        I have no DBIC expertise, sorry.

        Hopefully one of the grandees will answer here, I've send a PM to one of them.

        > You can't memoize with a nested hash, as input, effectively can you?

        Dunno!

        Well the naive approach is to stringify the nested hash with Data::Dump or similar and to use it as a hash key.

        But this will lead - like i said - to a lot of redundant data and you might need to free memory from time to time.

        > I mean you can compare deeply but is that more efficient than hitting the DB?

        IMHO only if you maintain index tables for key columns in hashes.

        Like an AND condition being a hash slice of the two %indicies. OR a join of both %indices

        But I'd guess there are already XS modules available offering in-memory SQL?

        Does SQLite always operate on the filesystem?

        Sorry im guessing here, I'm more an SQL user lacking deep knowledge.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

Re: Caching DB rows transparently and with SQL search
by Anonymous Monk on Jul 04, 2020 at 20:57 UTC
    In my web applications that are served by perl... with static data (eg phone dir with 100 entries), I just run cron task that generates javascript file with javascript data structures and that file is loaded and cached by browsers. I don't even try to go through XMLHttpRequest for JSON -- just plain javascript file that is refreshed lazily. This is quite useful solution -- data in Javascript is available as native data structures in browser... and on server side you have simple perl script to import data from DB and save that javascript file beeing included in web service.

      yes that's also one solution thanks for the suggestion

Re: Caching DB rows transparently and with SQL search
by perlfan (Vicar) on Jul 06, 2020 at 00:43 UTC
    1nickt recommended memcached, which fits your use case. I can also recommend looking at Sphinx Search, which can be used as a fast indexed/in-memory data store. It's more effective as you add more data (talking millions - or more - of records). It's a little harder to set up than memcached, but can be federated/sharded. It also has a query language of its own that can be used like most SQL constraints. The Perl module for it, Sphinx::Search, is also quite mature and robust. Sphinx also has capabilities that allow the index to be updated incrementally as well. You may also want to checkout Redis, which while not billed as a cache (but as a shared data structure server/middleware), it can be uses quite effectively as a cache or ephemeral data store.

      I'm using Net::Clacks to cache database results, for example for my DNS Server.

      But take note that i am biased, since i'm the author of Net::Clacks...

      perl -e 'use Crypt::Digest::SHA256 qw[sha256_hex]; print substr(sha256_hex("the Answer To Life, The Universe And Everything"), 6, 2), "\n";'

        Thanks for the pointer cavac. I did not know anything about this Clacks protocol. If I understand right, you use Net::Clacks for caching whenever a db request is needed/fetched. And that can be for multiple clients. You broadcast it, and clacks server deals with it. And has it available for all clients. In a client/server model. Fine, but what's the key? This is my primary obstacle in using (any) cache solution. Whenever I make a DBIx::Class db search I supply ($conditions, $attributes) which are hashes for the search, which can be undef as well. So, one part of the key is the current function, e.g. Model::User::list(). The other part of the key must include some kind of digest of said function parameters. Which is not that easy to calculate efficiently and beat the db call ...

        bw, bliako

      thanks for the suggestion. I will have it in mind although for my needs right now is OTP.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11118905]
Front-paged by LanX
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (3)
As of 2024-04-25 05:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found