Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re^2: Solution, it seems.

by suaveant (Parson)
on Oct 01, 2008 at 20:42 UTC ( [id://714896]=note: print w/replies, xml ) Need Help??


in reply to Re: Solution, it seems.
in thread What the flock!? Concurrency issues in file writing.

I am keeping away from databased solutions for now for a couple reasons: space (current report is over 100 million records); speed (I've had bad luck with db stuff slowing me down because of the added overhead that I don't really need); compatibility (eventually I want to have at least some of this be C, for increased performance, which makes DBM::Deep a bad choice); brevity (for the moment I am changing as little as I can to get this out quickly, we have a client waiting, more changes mean more possible points of failure without extra testing)

In the future I may look into some improvements, for now I want it to work. (not to mention its driving me nuts not knowing what is screwed up :)

                - Ant
                - Some of my best work - (1 2 3)

Replies are listed 'Best First'.
Re^3: Solution, it seems.
by Tanktalus (Canon) on Oct 01, 2008 at 23:13 UTC

    Space, speed, and compatibility would be exactly the reasons why I would go with a database solution. Ok, maybe not DBM::Deep, but using DB2 or Oracle or postgreSQL, or even mySQL, would get you most of this. (DB2 even has compressed tables which can do even more for both space and speed. I assume other dbms's have something similar, but admit to a lack of experience there.) A db allows you to partition off the space without respect to your application (separately twiddlable configuration), is already written in C/C++ for speed (compressed tables should be even faster - moving some data from the disk to the CPU which is faster), and is compatible with Perl, C/C++, Ruby, PHP, .NET, Java, and almost anything else you'd care to think about (I'm presuming that "shell scripting" isn't one you'd care to think about).

    As for brevity: the first time I went to a db was really for this type of scenario (but smaller). I had a bunch of data (~20k records) that I needed to track, and could receive requests/updates from multiple users simultaneously (and when you're talking about a smaller data set, the chance of overlap is increased). So I started to investigate flock and all that was required to ensure no data corruption. After a bit of time playing with it (say, about an hour), I gave up and decided that I was wasting my time. I came to the realisation that I'm simply Not That Smart. Who was, I wondered? Well, I figured guys who wrote RDBMS's have already figured this detail out. So I switched the entire thing over to DB2, and had something working in a matter of days. "But I don't have days!" you may say. Well, I say that this was my first database. I knew no SQL. By the end of it, I still couldn't pass your average "Introduction to Relational Database Systems" university course, but I had an app that didn't lose data. And I could produce many simple reports. And others who did know more than me could produce some more extensive reports without necessarily knowing a lick of perl. I'm assuming here that you know more SQL than I did, and then it shouldn't take nearly as long to do.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://714896]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (4)
As of 2024-03-29 04:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found