Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

flat-file vs DB_File

by cynix (Initiate)
on Sep 09, 2001 at 04:23 UTC ( [id://111221]=perlquestion: print w/replies, xml ) Need Help??

cynix has asked for the wisdom of the Perl Monks concerning the following question:

  I want to store some usernames and passwords in a file on my site. The username and password combined is only 20 chars. Now, if I have 4000 users stored in a flat file (each user-password pair getting one line), the file size is 80Kb or so. But, if I use a DB_File (with user-key, password-value pairs), the file size is around 180Kb or so.

My question is:

If data corruption is a non-issue (only admin can change data), and speed is the only issue; which storage format should I use? Especially since I'm only looking for one username.

Replies are listed 'Best First'.
Re (tilly) 1: flat-file vs DB_File
by tilly (Archbishop) on Sep 09, 2001 at 04:34 UTC
    DB_File is faster.

    There is often a tradeoff in dynamically allocated data structures between how little memory you use, and how fast your data access and manipulation can be. DB_File has unused memory sections. But that doesn't matter for access because you don't sequentially scan through the file, you jump right to the data, and it is a good thing for writing because if you want to add a little bit to one record, you don't have to rewrite the whole file to get the space.

      But that doesn't matter for access because you don't sequentially scan through the file, you jump right to the data, ...

      If the file page you're jumping to is already in the kernel's disk cache, this is a win. But if getting to the target page requires moving the disk head, you might be better off with a sequential scan. For an infrequently accessed file in the 7-8 disk page range, I suspect that the linear scan might win.

      I think we're in the toss-up category.

      BTW, this is a tricky one to test by profiling, since you may have to go out of your way to make sure that the file isn't cached.

      Thank you.
Re: flat-file vs DB_File
by pmas (Hermit) on Sep 09, 2001 at 06:44 UTC
    Using database is preferable in this situation IMHO, because:
    - it is more scalable (if more users/more fieds added, flat-file performance will degrade much faster
    - will add SQL/database skills to your toolset. It is better to start learning it on this kind of simple projects, so next time, when needed, you'll be ready...

    I agree with dws that reading small flat file with file caching can be faster, but I believe we need to ask business questions: if small number of users, speed gain will not be important enough for not applying scalable solution, so why not to do it properly? Next time, when cynix will need accomodate more users, s/he can start with something better than flat files...

    Just my $0.02

    pmas
    To make errors is human. But to make million errors per second, you need a computer.

Re: flat-file vs DB_File
by perrin (Chancellor) on Sep 09, 2001 at 05:44 UTC
    SDBM_File is actually significantly faster than DB_File. It's only drawback is a limitation on record size, which will be no problem for your data.
Re: flat-file vs DB_File
by jepri (Parson) on Sep 09, 2001 at 07:18 UTC
    I recommend the third option: DBI with DBD::RAM. Arrange it so your process keeps running (is a daemon. or uner mod_perl) and loads all the data on startup. Restart the daemon everytime you change the data.

    It may not be possible to do this, but if you do it'll be faster than both the methods you mention.

    ____________________
    Jeremy
    I didn't believe in evil until I dated it.

Re: flat-file vs DB_File
by bikeNomad (Priest) on Sep 09, 2001 at 04:47 UTC
    If you use BerkeleyDB (and perhaps DB_File) the underlying page size can be tuned to minimize wasted space.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://111221]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (3)
As of 2024-04-26 05:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found