http://qs321.pair.com?node_id=612738


in reply to Perl solution for storage of large number of small files

Sounds like a time to go to a real database. Have you tried this with DBD::Pg?
  • Comment on Re: Perl solution for storage of large number of small files

Replies are listed 'Best First'.
Re^2: Perl solution for storage of large number of small files
by tedv (Pilgrim) on Apr 30, 2007 at 13:49 UTC
    That was my thought as well. File systems aren't optimized to handle 1e6+ files in constant use, and that's exactly what databases ARE optimized for. Why does the problem definition require files, or is that negotiable?


    -Ted
      I had a stint on this project with sqlite, and although I like databases, I was (so far) under the impression storing binary data into a database is not the best way to go. For me the blob columns always felt like an add-on...

      Second, the performance with sqlite (async off) was poor, and it was only munging light text data. This is why I never thought of going completely db with this. And I didn't want to install a full mysql server on the machine, just to find out that it would have the same problems.

      I think using the fs is ok for ~20-30 mio. files. So, do you opt for using a db when this number grows?
        Can you explain more about what's contained in these files? How regular is their size and structure, both internal and external? All of these are factors that influence your decision of file system versus database.

        I'm just saying that "Millions of small fragments of data" is a vote for database over file system. Answers to the other three questions (regularity of size, internal structure, and external structure) might vote in favor of the file system.


        -Ted
        SQLite is a "database of last resort". It's good for prototyping, and for bundling with applications cos it requires no setup, but most of the time I'd prefer to use PostgreSQL, MySQL or a "grown-up" database like Oracle.

        Implementing your own primitive bucketing algorithm by using several DBM files sounds ... perverse. If going down the DBM route, I'd use just one file and let perl handle that. I'd also use GDBM_File in preference to DB_File as it has fewer restrictions on your data. You might want to consider DBM::Deep if you want to fake up a filesystem in a file, or go crazy and create a *real* filesystem in a file, optimised for your circumstances, and mount that.