Re: Basic Database?

Replies are listed 'Best First'.

Re: Re: Basic Database?
by mattr (Curate) on Jun 12, 2001 at 13:51 UTC

I actually was going to write a comment saying, "Don't use DBD::CSV. You will go insane. Okay?".

It actually is quite nice and I was delighted with DBD::CSV at first. But I found wierd bugs crawling in and though I still have a project using it, find it a little fragile and unscalable. True, I implemented it soon after it came out so maybe it is much better now. But you also want to be careful with things that could break CSV, especially (sounds dumb but..) vertical tabs from Excel. In fact, Excel is evil. View something in Excel if you like but if you must edit in Windows, use Access! Excel CSV generation is broken.

I suppose Windows vs. Unix and also doing it all in Japanese didn't help so maybe it was a bit of a stress test. I also hacked SQL/HTML v1.0 by Jonathan A. Zdziarski to work with DBD::CSV for a quick internal admin interface to the db, which was pushing things a bit. Also if you think you have a unique key you will have to strongly enforce its uniqueness every step of the way. That said, it works for very basic things. Best use of it I would think is if you periodically received a CSV file and wanted to deal with it in SQL. I really enjoyed how easy it was to write the search algorithms with SQL. But keep backups and go look at and possibly edit the raw file periodically please.

That said, DBD::CSV does work and you can see it working in the event calendar of the Royal Netherlands Embassy in Japan. (The free text engine is different algorithm, not SQL).

There is another thing, you could have some flat file or other which you slurp into memory and then manipulate with DBD::RAM. I keep trying to find some reason to try working with it.. Hey I just installed it from cpan, it's back up again! I wonder which is faster, DBD::CSV or slurp + DBD::RAM ?..

Running test.pl from DBD::CSV and then from DBD::RAM (edited).

Testing empty loop speed ...
CSV 100000 iterations in 0.2 cpu+sys seconds (476190 per sec)
RAM 100000 iterations in 0.2 cpu+sys seconds (416666 per sec)

Testing connect/disconnect speed ...
CSV 2000 connections in 1.6 cpu+sys seconds (1265 per sec)
RAM 2000 connections in 1.3 cpu+sys seconds (1503 per sec)

Testing CREATE/DROP TABLE speed ...
CSV 500 files in 1.8 cpu+sys seconds (273 per sec)
RAM 500 files in 1.0 cpu+sys seconds (480 per sec)

Testing INSERT speed ...
CSV 500 rows in 1.1 cpu+sys seconds (450 per sec)
RAM 500 rows in 0.6 cpu+sys seconds (793 per sec)

Testing SELECT speed ...
CSV 100 single rows in 8.7 cpu+sys seconds (11.5 per sec)
RAM 100 single rows in 6.7 cpu+sys seconds (14.9 per sec)

Testing SELECT speed (multiple rows) ...
CSV 100 times 100 rows in 10.7 cpu+sys seconds (9.4 per sec)
RAM 100 times 100 rows in 8.8 cpu+sys seconds (11.3 per sec)
[download]

Updated:Yow! Turns out DBD::RAM works with not only local CSV files, but also files accessible via http and ftp, and it even works on arrays and hashes? Woohoo! I think I like the idea of using DBD::RAM better than DBD::CSV because I can migrate data to a different format if I decide it needs to be binary, and I don't try to fool myself thinking my flat file is an SQL database engine. Seems you can separate the SQL query interface and the data I/O to some degree. Maybe a separate perl process to maintain that in memory plus IPC? Interested in anyone with experience with RAM.

Funny, it says in the man page that you can have it write to disk for every modification of the db. I wonder if that would satisfy some people who are worried about Mysql putatively not saving data quickly.

[reply]
[d/l]


Don't ask to ask, just ask
	PerlMonks