http://qs321.pair.com?node_id=285031

TacoVendor has asked for the wisdom of the Perl Monks concerning the following question:

I have a question about the right database type to use with a script. First though, a bit of background:

I am using ActiveState's perl 5.8 on a win32 platform (Windows 2000 to be exact, but this will be portable between win32 platforms). I have access to and can install nearly any database available for win32 systems.

My script is gathering data from a couple of sources and storing it now in a few flat text files. I need to expand my script to grab more data and store it all. My estimations show that I will start with about 600 rows and about 20 columns of data. None of the data being grabbed will exceed 20 characters. I do expect the database to grow to somewhere around 75,000-100,000 rows of data with everything else remaining the same. Knowing the data that will be stored, I am leaning toward a relational database but my main concern with all of this is getting the data the fastest way possible.

I have not used perl to store or retrieve data from anything other than flat-files to this point. Is there a preference amongst the people here on the database to use? Is there a certain database type or structure that perl works with better than others? Any pointers to any place that this question has already been answered?

Thanks in advance for your help.

Replies are listed 'Best First'.
Re: Selecting the right database for perl
by Abigail-II (Bishop) on Aug 19, 2003 at 21:13 UTC
    100,000 rows, 20 columns with at most 20 chars per column? That's only 40 Mb of data. The only reason you might need a database is for its transactional support. But 40 Mb of data is enough to keep in memory on a modern day low entry box.

    But if you get more data, or if your data is highly volotile, you might want to use a database. But which database is appropriate doesn't depend on the fact you happen to work with Perl. It depends on your data. How much data, what are you going to do with it, how intensive is it accessed/modified, how fast does it need to be, how much support is needed, how much can you spend on a license. Those are important question, that's what determines your database. Perl will happily interface with almost all databases. Don't let Perl determine your database, let your data determine it. Perl will adapt.

    Abigail

      Build it into hashes and arrays in perl and it will pop up to at least 100mb, get sloppy and copy instead of use refs and you can grow that amount very qucikly and cause big slowdowns. modify in memory and delay writes to disk and you can kiss it goodbye on poweroutage/system crash. Even though it may be possible (or even prudent) to do it in all memory understand the risks and problems you may be up against. If you have oracle/sybase/ms-sql licenses use what you have, else there are free databases such as mysql and postgres.

      -Waswas
        modify in memory and delay writes to disk and you can kiss it goodbye on poweroutage/system crash.

        I took an exception on transaction support. He didn't specify what kind of database he needs - if it's being modified heavily, you want to have something that makes sure no data gets lost on a crash. But if all he does is querying, having it all in memory is what you want.

        Abigail

      If you decide to keep it in memory, check out Tie::Persistent.
      Update
      However, I'd suggest using MySQL - it's free, fast, and an excellent choice for simple database applications. Easy to administrate too (another thing to consider).
Re: Selecting the right database for perl
by Corion (Patriarch) on Aug 19, 2003 at 21:10 UTC

    As long as you don't need a "real" server (and as you're dealing currently with flat files, it dosen't sound like it), DBD::SQLite is a very nice solution to store data in files and access them fairly fast through SQL and Perl.

    Of course, as soon as you have more than one process stomping over the data, all sorts of interesting issues appear, so you might want to consider whether you will stay with a single-process solution or file locking, or will want a real solution with concurrent access, which is something that SQLite dosen't do from what I remember.

    perl -MHTTP::Daemon -MHTTP::Response -MLWP::Simple -e ' ; # The $d = new HTTP::Daemon and fork and getprint $d->url and exit;#spider ($c = $d->accept())->get_request(); $c->send_response( new #in the HTTP::Response(200,$_,$_,qq(Just another Perl hacker\n))); ' # web
Re: Selecting the right database for perl
by monktim (Friar) on Aug 19, 2003 at 21:20 UTC
    I also use Microsoft SQL Server with PERL and I have no problems either. I use Win32::ODBC as the interface. My only complaint is the lack of error reporting but that is inherent. Of course you may not want to pay for a license and you may not need all that power. Many monks seem to be using MySQL with no problem. I haven't used DBI but it seems pretty straight forward. Some monk named gmax has a great node about dB and PERL. If you haven't seen it, check it out Before asking a database related question ....

    Tim
      Because I really do enjoy getting downvoted, it's Perl or perl, not PERL. Thanks.
      I also use Win32::ODBC and it works very nicely. Just becareful when an error does occur and capture it to a log file and don't display the error on the screen because that can lead to security issues since you are now displaying table names and columns names to the user who may be an online user and not someone internal to the company.
Re: Selecting the right database for perl
by jdtoronto (Prior) on Aug 20, 2003 at 02:45 UTC
    TacoVendor,

    I saw 10 replies to your queastion and was tempted not to weigh in on this one. But having read them, well I just cannot help myself!

    Firstly, by moving to a DBD/DBI driven framework your application will not only become moreportable across differing platforms, but also between solutions on a single platform.

    I have Perl 5.6/5.8 scripts (some are complete GUI programmes!) that can move from MS-Access on Win32, MS-SQL server, and MySQL on Win32. Then I can run the same thing on Linux with PostGres or MySQL (I prefer the latter).

    Even though you dont 'need' MySQL or something similar, it wont do you any harm either!

    As someone else said, you have chosen a partable programming platform, now you should choose a portable database solution. I develop on WindowsXP using ActiveState 5.8.0 with MySQL installed on the same machine for both GUI applications and CGI scripts. My GUI apps end up running 75% on Windows and 25% Linux, the CGI is almost universally running under Apache on Linux. It is a framework that has worked very well for me.

    Currently I am working on a 'small' GUI product, It is around 1000 lines of code in Perl, with about 2 million records in a MySQL database. Works like a charm on a simple entry level Celeron machine we use for testing. In fact it runs very well on a Pentium III machine which is about 6 years old too!

    Good luck! jdtoronto

Re: Selecting the right database for perl
by tcf22 (Priest) on Aug 19, 2003 at 21:39 UTC
    At work, we use all Windows, so we use SQL Server 2000, with Win32::ODBC. I've also used mySql with DBI on Windows and Linux. With 100,000 rows, either of these will probably work pretty well for you. Using mySql will obviously be more cost effective.

    I've never pushed mySql to the limit, so I'm not sure how big a DB has to be to hinder performance. I've used SQL Server with table that held 1-2 million rows, and it was still relatively fast.

    I'm normally a Linux user, so I would personally use mySql with DBI.
Re: Selecting the right database for perl
by batkins (Chaplain) on Aug 19, 2003 at 21:49 UTC
    I've used MySQL in several projects, and it's been great. It gives extremely fast response times even with large datasets and you can install the mighty fancy phpMyAdmin tool to administrate your databases.

    It's fairly simple to access a MySQL database. Just use the DBI or, better yet, Class::DBI.

Re: Selecting the right database for perl
by WhiteBird (Hermit) on Aug 20, 2003 at 01:20 UTC
    Just my 2 cents...

    I work on a Windows system with Perl 5.6.1. I've used Perl with MS Access, MSSQL Server and mySQL using DBD::ODBC and DBI. As others have said, let the data drive your choice of a database, as well as the availability of said data source and your personal preferences. Perl works equally well with any and all of the ones above.

    I actually started out with an Access database because that was what I had on my PC at work and also what I was most familiar with. Problems on my out-sourced server required that I port everything into MSSQL. After moving my data, it was a simple thing to change the DBI references and the code continued to work perfectly. Later I moved some of the database info to an internal server and ported it into mySQL. Again, a quick change to the DBI references and all was well.

    I say all that to say that even if you decide on a particular database at this time, that won't have to be set in stone. It's fairly easy to convert data and have perl come along with the changes. For simple database work, I prefer the mySQL. It's readily available, portable across systems, isn't Microsoft and is fairly user friendly.

Re: Selecting the right database for perl
by chunlou (Curate) on Aug 19, 2003 at 22:29 UTC

    One other reason you may want to use a database is when you need to combine different data in ways that SQL will be a more natural way to do so. Another reason is if you would like to normalize your data, i.e. minimizing the storage of redundant data.

    If you hadn't got much experience with database yet, MS Access actually would be a good place to start, at least for practicing with database design and basic query, as it doesn't need configuration and much resources.

    If you want a database that can run even on a laptop but still scalable, MySQL is a good choice. It's economical too (and free for personal use).

    Otherwise MS SQL and Oracle are among the most popular (and more expensive) choices for Win32 but they really need dedicated server. They don't seem like to share.

    If you use DBI to access your data and you have proper abstraction layer on your code, what database you use should be more or less transparent to Perl, especially when you're not using any vendor-specific feature.

      MySQL is ... free for personal use.

      Slight correction. MySQL is actually free for nearly any use. The GPL does not allow restrictions to be placed on what a product is used for. The "commercial license" is for folks who want to do stuff the GPL explicitly disallows -- namely, close the source. As long as one doesn't plan on distributing MySQL closed source, there's almost nothing you can't do with it (in terms of allowed usage; I'm not talking about competency as a database engine, which is something I'm not the best judge of).

        Thanks for the explanation.
Re: Selecting the right database for perl
by bear0053 (Hermit) on Aug 19, 2003 at 21:05 UTC
    I use miscrosoft's sql server and perl on a regular basis and have no problems with integrating the two. The sql server runs on windows advcanced server and works fine...Thats my 2-cents
Re: Selecting the right database for perl
by TacoVendor (Pilgrim) on Aug 19, 2003 at 22:00 UTC

    OK. I am just following up after reading the responses so far.

    I don't think I need to go with a sql server type of solution. My data will only be called from and written to by this one script, and should not ever get larger than what I specified in my inital post (yeah, yeah, famous last words, I know).

    I do want to be able to grab information from the database without having the wait that goes along with opening a 40MB database. Remember, this is win32 I will be running on - 40MB does have a wait time when opening and saving a single flat file back out.

    While I should be able to only open the whole file/database once and do all I need from memory, I will be needing to get one piece of information from a row and doing calculations with other data from the same column. These calculations could start involving data from 800-900 rows at a time. Then I will need to get a different piece of information and start all over again. I could see my physical memory getting pushed to the limit in no time with the right calculations and results.

    I see that I wasn't all that clear originally, but I do have some experience with sql databases. Not enough to be able to say I can do that work full time, but enough to be able to maintain and fix things in a crunch. I just have no experience using perl to access databases. I have read up and see easily enough how to do it, I just have no clue if something works better than something else.

    Thank you all that have answered so far, and especially to monktim for the pointers.

Re: Selecting the right database for perl
by LameNerd (Hermit) on Aug 19, 2003 at 23:28 UTC
    You have selected a portable language, Perl. You should now choose a DBMS that is also portable. (i.e. not MS anything)
Re: Selecting the right database for perl
by dws (Chancellor) on Aug 20, 2003 at 05:01 UTC
    Knowing the data that will be stored, I am leaning toward a relational database ...

    Without learning more about what you mean by "knowing the data", it is way premature to start making recommendations.

    Can you say more about the data (e.g., what you plan to do with the data after you've gathered it), and why you're leaning towards relational?

Re: Selecting the right database for perl
by richyboy (Acolyte) on Aug 20, 2003 at 19:36 UTC
    Slightly OT, but for anyone using MySQL, have a look at DBDesigner:

    http://www.fabforce.net/dbdesigner4/

    Open source with binaries for Win32 and Linux.

Re: Selecting the right database for perl
by EdwardG (Vicar) on Aug 20, 2003 at 13:41 UTC

    With your circumstances I'd expect that you already have a license to use (and possibly also to redistribute) the desktop edition of MS SQL Server, MSDE.

    Despite its name, this is not a toy version! It gives you almost everything that the comes with the full version, but is limited to a max of 2GB of data and five concurrent sessions (it queues them, you wouldn't notice on a low transaction system). From memory I think it doesn't support all types of replication either.

    I use DBI to interact with MSDE and so far I've not come across anything that was excessively problematic. I had to think about maximum sizes of binary columns to avoid truncation errors, but I count that as a good thing and didn't resent DBI for bringing the issue to my attention :)

    I notice from your other comments that you are a little concerned about the ease of manipulating your data in memory, or perhaps more specifically about memory consumption while doing this manipulation.

    My comment is that while storing 40MB (Abigail's estimate) in memory should not pose any problem, manipulating data in memory is something else altogether, particularly if you end up with data that is normalised.

    Let SQL Server do this hard work on your behalf - it's not just about speed, and it's not just about transactions, it's also about suitability to task.

    And in the current slowed IT market it doesn't hurt to add another marketable string to your bow.

    And in case anyone wonders; No, I don't work for the evil empire. :)

Re: Selecting the right database for perl
by dragonchild (Archbishop) on Aug 20, 2003 at 13:10 UTC
    If you're leaning towards using a RDBMS, I would use an RDBMS. Requirements change and it sounds like you're at the limits of what you consider acceptable performance from flatfiles. Take the time now and convert it to using DBI. Once you have it using DBI, you can access (nearly) any (R)DBMS known to man. (You can use DBD::Sybase to talk to MS-SQL, as well as Win32::ODBC. I would recommend using DBD::Sybase as it allows portability to non-Win32 systems.)

    ------
    We are the carpenters and bricklayers of the Information Age.

    The idea is a little like C++ templates, except not quite so brain-meltingly complicated. -- TheDamian, Exegesis 6

    Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.