Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

Re: Storing/Retrieving images as blobs

by dHarry (Abbot)
on Sep 26, 2011 at 06:59 UTC ( #927801=note: print w/replies, xml ) Need Help??

in reply to Storing/Retrieving images as blobs

I once worked on a project moving TIFF images (scanned passports and other id docs) stored as BLOBs in an Oracle database to the file system. We did run into performance problems and moving them out solved things. I don't know the limits of mySQL but I think sooner or later you will run into performance problems as well. In the end the file system gives better performance (scales better). You have many small images, so some clever tree structure to quickly locate an image is needed. Many little files in one directory is normally an OS-killer. There are also dedicated image databases around which do a good job.



Replies are listed 'Best First'.
Re^2: Storing/Retrieving images as blobs
by tweetiepooh (Hermit) on Sep 26, 2011 at 11:03 UTC
    Why a clever tree structure when you could have a table that contains a file name? I'm not expert enough to know whether a binary store in the table or file in file system is the winner though. Having a file would mean the file is available should the database be down and if you use some logical scheme should be locatable and accessible by real people.

      The OS killer issue alluded to above is when you have a large number of entries in a single directory. In order for the OS to resolve the file name it needs to look up each portion of the file name in the directory. If there are a significant number of entries in a directory, multiple reads on disk may be necessary to find the particular entry.

      As an example: assume that you can store 20 directory entries in a block that can be read in a single shot. If you have 100 files, then on average, you will need three disk reads to find the location of a particular file (N/2 comparisons gets you to the third block of five necessary to store 100 entries). This increases to 25 reads for 1000 entries (1000 / 2 => 500, 500 / 20 => 25).

      If you can break the directory structure into N parts so that no directory contains more than 20 entries, you can find the file location, on average, in two reads (100 / 20 => 5 entries in each subdirectory). With 2 levels of no more than 20 entries each, you can store and retrieve 400 files with a guaranteed maximum number of reads of 2. 8000 in three, 160K in 4, 3.2M in 5, and so on.

      The value of 20 depends on the file system and how directory lookups are done in the OS or the application. A full readdir scan at the application level is less able to be optimized by the OS than having the OS open the file for you. Basically, you need to have a depth of X and number of entries of Y where YX is the number of files you can store (N) before you are willing to reorganize your files.

      If the directory entries are actually stored in a database (I think that BeOS had that feature), then you may not need to worry about the read sizes and how many directory entries can fit into a single read.

      As an interesting aside, the classical Unix File System used this technique to minimize the number of lookups it had to do to find the data blocks for given files. The inode pointed to by the directory entry had pointers to data blocks for small files, then some pointers to blocks of pointers to data blocks for larger files, and then some pointers to blocks of pointers to blocks of pointers for even larger files. The larger the number of files data blocks, the deeper the "directory of pointers" became.


      There are several pros/cons to both alternatives and a detailed tradeoff analysis is needed to draw any conclusions. In my experience the database sooner or later grinds to a halt when the tables with BLOBs become too big. After that there is no choice really. The 3GB of the OP doesn't seem too big but I'm thinking Oracle here. I have no idea when mySQL gives up, I suppose it depends on type/version of mySQL database, configuration and hardware etc.

      Assuming you reach the point where you have to move the images to the file system, it pays of the design a directory structure to prevent the OS from choking and/or improve access times. You typically don't want millions of small files in one directory.

      As I mentioned an alternative is a dedicated database product to store the images. These databases are optimized for that purpose.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://927801]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (3)
As of 2020-06-05 00:27 GMT
Find Nodes?
    Voting Booth?
    Do you really want to know if there is extraterrestrial life?

    Results (35 votes). Check out past polls.