http://qs321.pair.com?node_id=612765


in reply to Perl solution for storage of large number of small files

I'm also curious about why you don't want to use a database. Is it because you want to be absolutely sure of when data is safely stored on the disk, or because it is faster this way? IIRC Oracle was designed at least originally to take advantage of physical layout on the disk, though perhaps not so important these days. Did you try dumping this into Mysql or PostgreSQL and dislike the solution for some reason? Have you tried sleepycat's BDB?

Incidentally InnoDB performance tuning tips notes:

Wrap several modifications into one transaction. InnoDB must flush the log to disk at each transaction commit if that transaction made modifications to the database. The rotation speed of a disk is typically at most 167 revolutions/second, which constrains the number of commits to the same 167th of a second if the disk does not fool the operating system.

So one disk rotation is 6msec minimum right there. Are you spreading your tied files across several disks? Do you require every write to be saved to disk physically instantaneously, or can you wait a second or so?

Oh, the other thing is if you have disk to burn you could increase your inode size, on XFS, or mirror your disks for speed. But regardless, it seems that moving to a database implementation now rather than waiting for things to explode might be a good idea. I don't suppose your system could do locking to handle multiple writers, could it? Perhaps more info about what you are actually trying to do would be useful.

Also, I was thinking about a presentation at YAPC::Asia I think it was, about how a large service was built on Perl. Livedoor or Mixy. Anyway they split their indices and tables across different servers (using the first characters of user names IIRC). They built a system capable of easily repartitioning this layout as users increase.

  • Comment on Re: Perl solution for storage of large number of small files

Replies are listed 'Best First'.
Re^2: Perl solution for storage of large number of small files
by diotalevi (Canon) on May 01, 2007 at 00:20 UTC

    DB_File is the old API for BerkeleyDB, the Sleepycat database. It's the same thing.

    ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

Re^2: Perl solution for storage of large number of small files
by andye (Curate) on May 01, 2007 at 10:29 UTC
    InnoDB must flush the log to disk at each transaction commit if that transaction made modifications to the database. ... constrains the number of commits to the same 167th of a second

    unless you set innodb_flush_log_at_trx_commit to 0, which switches it to flush once a second.
    http://dev.mysql.com/doc/refman/5.0/en/innodb-parameters.html

    HTH, andye

      Thanks, that is the page I was looking at, why I mentioned "1 second or so". It seemed he was unwilling to wait that long but buffering it would it seems increase efficiency.