Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Re: Databases and tied hashes.

by DarkBlue (Sexton)
on Feb 11, 2001 at 19:53 UTC ( #57740=note: print w/replies, xml ) Need Help??

in reply to Databases and tied hashes.

Summarising then, from the various replies I have received (with thanks):

1) Whilst in concept stage I am more than adequately served by GDBM_File for a relatively simple database structure

2) As I do not require a relational database for this product, I can effectively ignore SQL, Oracle, Postgres, etc., as the demands of my database (even with the most optimistic predictions) simply don't justify the capabilities of those systems

3) To "industrialise" my product, BerkeleyDB is easily capable of serving my database ("multi-threaded concurrent read/write, hundreds of terabytes of data, 30,000+ accesses per second"). The server should have large file support in order to take full advantage of BerkeleyDB. Therefore I can keep the entire product non-proprietary and open-source

4) BerkeleyDB will allow me to use BTree as and when required (for reduced disk access during usage)

5) I should probably implement file-locking on the database for any write operation to maintain data integrity (despite the slight latency this will introduce)

6) The database should be backed-up regularly (which goes without saying) and plain-text backup is also probably a good idea

7) The database should run on its own, dedicated server. Not on a server that's also having to serve web-pages and other CGI

8) When using a tied-hash/GDBM_File database model, the entire database IS NOT loaded into memory for every single read or write operation (or thread). But there will be disk accesses for every such operation. Therefore, I'd conclude that the server needed to host this application would probably be okay with between 256Mb - 1Gb or RAM. The processor should be as fast as possible. The disk sub-system should be as fast as possible. RAID (what level?) would be best as the fault-tolerance would benefit the "industrialised" version of this database.

Does anyone disagree with any points in this summary?

Thanks again for your speedy replies and the invaluable insight you have given me.

Jonathan M. Hollin

Replies are listed 'Best First'.
Re (tilly) 2: Databases and tied hashes.
by tilly (Archbishop) on Feb 11, 2001 at 22:52 UTC
    Some clarifications on those points.
    1. If your final version won't need an SQL database, then a dbm database is fine for the concept stage.
    2. What separates the need for a relational database from a dbm is your data model. If you are starting to get into relationships and correlations between data (eg taking sales figures and getting reports of sales by customer, by product etc) then you clearly wanted a relational database. If you want a simple lookup, then a dbm is just fine.
    3. Berkeley DB is indeed an industrial strength database. It is particularly well suited to situations which need very high performance for simple tasks. (It is also great for embedded use, but I digress.) The bottlenecks that you will hit first have to do with the CGI model.
    4. Yes. GDBM may as well have BTrees. The wins of BTrees here are that they keep data in order (hashes do not) and get better locality of reference (a very organized access pattern). If your data fits in memory then hashes are generally faster. If not, then BTrees are not.
    5. Yes. In high performance read-write situations, locking is important and how it is done is going to be your bottleneck. Most web applications are write seldom, read many times.
    6. Yes. Backup. And don't expect that binary data formats will be portable from machine to machine.
    7. If you want a website to scale, definitely. It is much easier to balance a load across 5 webservers than keep 5 databases in sync. However if you are anticipating this need, using a dbm solution will likely involve some custom work. Relational databases all have the data access segregated into its own process so the database can be moved to another machine. dbms traditionally do not.
    8. I think you are dramatically overestimating the needed resources.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://57740]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (4)
As of 2021-10-22 05:29 GMT
Find Nodes?
    Voting Booth?
    My first memorable Perl project was:

    Results (85 votes). Check out past polls.