Perl: the Markov chain saw | |
PerlMonks |
Mysql, CSV, and flatfiles, a benchmark.by illitrit (Friar) |
on May 07, 2001 at 19:00 UTC ( [id://78517]=perlmeditation: print w/replies, xml ) | Need Help?? |
Recently I became curious if using Mysql for anything that needed to store data was the best way. I decided it was time to learn howto use Benchmark so the following results may be flawed. The program used to do the benchmarking can be found here due to it's size it is not included in the post. Feel free to point out where my method or program may be flawed.
Tests were done on the following machine: Linux 2.2.6 i686 dual PIII 500 processors, 256MB Ram Mysql server version 3.22.25 I tested Mysql, Text::CSV_XS, and flatfile (pipe delimited) storage. The data access methods I tested were insert, select, update, and delete. I'll cover my approach for each below. I am aware that there are ways to optimize flatfile and perhaps CSV access using an index of the row start positions in another file. The Perl Cookbook shows a way to create an index in recipe 8.8 for the curious. I am sure there are other ways to improve access speeds on flatfile and CSV data, however the intent was to test the "basic" methods as I know them. Setup: I connected to the Mysql database, created a new Text::CSV_XS object, defined my csv file and flatfile. Then I created an array of hashes containing 200 different "row" hashes. Each hash looked like this ($_) is the current count (1..200):
Each iteration for Insert, Select, Update, and Delete does the appropriate action on EACH of these 200 items, so 500 iterations is actually doing 100000 Inserts, Selects or Updates. Deletes are only done once per item due to the data being destroyed in the process. Insert: The main item of note for the insert tests was my use of $sth->bind_columns to collect the data. I used this method after reading this node by kudra. For CSV and flatfile inserts were basically just appended text to the end of the file. Insert Results:
Select: For select statements in flatfile and CSV I just looped through the lines of the file and compared the "ID" field with the desired "ID" value. All selects were done on the "ID" value as opposed to other values. Select Results:
Update: Updates for flatfile and CSV were done by looping through the lines of the file and copying each line to a .new file while altering the "updated" line before the copy. After the file had been looped through the .new file was renamed as the original. Again our "key" was the "ID" field. Update Results:
Delete: Deleting data offered a difficult challenge to me, how do you delete data for more than one iteration since you've obviously destroyed your dataset after the first iteration? I was unable to find a way to do this that I didn't think would skew the results, so deleting was only done once thus the results are likely inaccurate. Delete Results:
My Conclusions: In all tests it would appear flatfile was the winner with Mysql coming in a close second. CSV appears to be quite slow so I think I will only use it if I'm forced to deal with someone else's CSV data. Thank you for your thoughts and suggestions, James Edit: Some suggested information from TheoPetersen which I very much agree with...
Back to
Meditations
|
|