bliako has asked for the wisdom of the Perl Monks concerning the following question:
Esteemed Monks, I need some advice on DB caching,
A typical web-app, let's say using Mojolicious and DBIx::Class, has a lot of semi-static data stored in the DB. For example the list of registered users, the set of allowed tags, phone numbers of associates. This data does not change often (perhaps once a week/month) so why do I fetch it every time a user looks up another user's phone. (forget at the moment issues of privacy - all users are allowed to see all other users details). Another example is a business phone directory with 100 entries. Fairly static and very much in use. This data is not so big in size so I want to keep it in the memory of my app and serve this instead of fetching it from db each time a webpage with a dropdown menu of all phones is loaded.
So, I thought on starting up the app, create a helper and store these semi-static data (fetched once and only from DB). Then serve this and never bother the DB again. That's fine but how about providing SQL-style search for this data? Because sometimes I will get requests from the client (e.g. when it loads a user's profile page) of /user/list?id=xyz in addition to /usr/list?id=all and also /usr/list?name=*abc*. These are all implemented as SQL searches in DBIx::Class and work fine but they are not cached.
This is what I would like (and this is the 21st century and I want to have it): either tell DBIx::Class that specific table should be cached unconditionally until I say so. This transparency will be ideal for using existing code. OR, less preferred, store the static data in Perl and via a module have SQL searches on it, as if it was residing in a DB.
thanks, bw bliako
EDIT: I have asked a similar question at https://webchat.freenode.net/#mojo (which was part of a more general question)
Re: Caching DB rows transparently and with SQL search
by 1nickt (Canon) on Jul 04, 2020 at 14:14 UTC
|
Hello brother bliako,
At $work we extract static data from the DB at startup and load it into a hash in memcache, and retrieve it using key paths. Being able to query the data using SQL would seem to be just an implementation detail.
Possibly of interest:
Hope this helps!
The way forward always starts with a minimal test.
| [reply] [d/l] [select] |
|
Hello again brother 1nickt,
Thanks for the pointers. DBIx::Class::Cursor::Cached looks the most transparent to me if I don't want to have duplicate code to access my actual db and separate cache/memdb. I will most likely experiment with this tomorrow and adopt it.
Being able to query the data using SQL would seem to be just an implementation detail.
... an "implementation detail" the size of Everest Mountain I would say :) ! But anyway, since I use DBIx::Class I rarely have to use raw SQL, so the bit in my post about SQL could be ignored.
However, the need to access cached data by key remains if I follow the non-transparent, cache route. LanX suggested ( Re: Caching DB rows transparently and with SQL search ) memoizing the db calls, which for me are typically (edit: just a bit 1' after) my $conditions = { registered => 1 };
my $atrtributes = {columns => ['email','name']};
my $registered_users_rs = $schema->resultset('User')->search($conditio
+ns, $attributes);
or at a higher level my $users = Model::User::list($conditions, $attributes); . So, right now I am experimenting with converting arbitrary function parameters (e.g. ($conditions, $attributes) both being hashrefs) to a cache key. In short, an md5 for a (nested) hashtable! But I will ask this in a separate post. At the moment, studying Memoize shows that this problem has partly been solved but still requires, as I understand it, manual tweaking of the normalizer. All the best and thanks for the pointers. | [reply] [d/l] [select] |
Re: Caching DB rows transparently and with SQL search
by LanX (Saint) on Jul 04, 2020 at 11:56 UTC
|
I suppose encapsulation of the desired queries in subs with memoizing isn't good enough because of the sheer amount of redundant data for different queries?
I found this very old SO discussion:
Does DBIx::Class have transparent caching
Hopefully it's of help. :)
| [reply] |
|
Yes, because at the moment I supply a hash of conditions and attributes for the DB search. You can't memoize with a nested hash, as input, effectively can you? I mean you can compare deeply but is that more efficient than hitting the DB?
At a lower level (say, subclassing DBIx::Class's search as the link you cited suggests), it should be able to distinguish equivalent SQL even if they seem different (order, temp table names).
But I now see clearer that it helps to constraint the searches and abstracting them before they are turned into SQL, before they even hit DBIx::Class will be much more efficient but creates a parallel universe of code in the app.
| [reply] |
|
I have no DBIC expertise, sorry.
Hopefully one of the grandees
will answer here, I've send a PM to one of them.
> You can't memoize with a nested hash, as input, effectively can you?
Dunno!
Well the naive approach is to stringify the nested hash with Data::Dump or similar and to use it as a hash key.
But this will lead - like i said - to a lot of redundant data and you might need to free memory from time to time.
> I mean you can compare deeply but is that more efficient than hitting the DB?
IMHO only if you maintain index tables for key columns in hashes.
Like an AND condition being a hash slice of the two %indicies. OR a join of both %indices
But I'd guess there are already XS modules available offering in-memory SQL?
Does SQLite always operate on the filesystem?
Sorry im guessing here, I'm more an SQL user lacking deep knowledge.
| [reply] |
|
| [reply] |
Re: Caching DB rows transparently and with SQL search
by Anonymous Monk on Jul 04, 2020 at 20:57 UTC
|
In my web applications that are served by perl... with static data (eg phone dir with 100 entries), I just run cron task that generates javascript file with javascript data structures and that file is loaded and cached by browsers. I don't even try to go through XMLHttpRequest for JSON -- just plain javascript file that is refreshed lazily. This is quite useful solution -- data in Javascript is available as native data structures in browser... and on server side you have simple perl script to import data from DB and save that javascript file beeing included in web service.
| [reply] |
|
| [reply] |
Re: Caching DB rows transparently and with SQL search
by perlfan (Vicar) on Jul 06, 2020 at 00:43 UTC
|
1nickt recommended memcached, which fits your use case. I can also recommend looking at Sphinx Search, which can be used as a fast indexed/in-memory data store. It's more effective as you add more data (talking millions - or more - of records). It's a little harder to set up than memcached, but can be federated/sharded. It also has a query language of its own that can be used like most SQL constraints. The Perl module for it, Sphinx::Search, is also quite mature and robust. Sphinx also has capabilities that allow the index to be updated incrementally as well. You may also want to checkout Redis, which while not billed as a cache (but as a shared data structure server/middleware), it can be uses quite effectively as a cache or ephemeral data store. | [reply] |
|
I'm using Net::Clacks to cache database results, for example for my DNS Server.
But take note that i am biased, since i'm the author of Net::Clacks...
perl -e 'use Crypt::Digest::SHA256 qw[sha256_hex]; print substr(sha256_hex("the Answer To Life, The Universe And Everything"), 6, 2), "\n";'
| [reply] [d/l] |
|
Thanks for the pointer cavac. I did not know anything about this Clacks protocol. If I understand right, you use Net::Clacks for caching whenever a db request is needed/fetched. And that can be for multiple clients. You broadcast it, and clacks server deals with it. And has it available for all clients. In a client/server model. Fine, but what's the key? This is my primary obstacle in using (any) cache solution. Whenever I make a DBIx::Class db search I supply ($conditions, $attributes) which are hashes for the search, which can be undef as well. So, one part of the key is the current function, e.g. Model::User::list(). The other part of the key must include some kind of digest of said function parameters. Which is not that easy to calculate efficiently and beat the db call ...
bw, bliako
| [reply] [d/l] [select] |
|
| [reply] |
|
|