scripts stops running .. almost

radu_marg has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: scripts stops running .. almost by mzedeler (Pilgrim) on Jul 17, 2009 at 10:14 UTC
If you populate a B-tree style file, the performance on inserts quickly degrades with the number of records already inserted. Try measuring the insert performance as function of the number of records in the file. Using the hash variant that Berkeley DB supports should give you much better performance. Another option is to try profiling your script. That should provide an indication of where the bottleneck is.	[reply]
Re: scripts stops running .. almost by targetsmart (Curate) on Jul 17, 2009 at 10:11 UTC
IMHO your program is eating all the available memory without releasing/reusing... how much percent of memory does that program take after 2-3 hours of running?. Vivek -- 'I' am not the body, 'I' am the 'soul', which has no beginning or no end, no attachment or no aversion, nothing to attain or lose.	[reply]
Re: scripts stops running .. almost by JavaFan (Canon) on Jul 17, 2009 at 11:48 UTC
Give it some coffee! I slow down after 2-3 hours working full speed without coffee either.	[reply]
Re: scripts stops running .. almost by tweetiepooh (Hermit) on Jul 17, 2009 at 14:46 UTC
Some databases don't write the data in properly without an explicit commit, filling up temporary structures. This is to ensure read consistency. Maybe there is some form of commit like statement that will flush this database buffer, remove locks, relock and start again with the next block of data.	[reply]
Re^2: scripts stops running .. almost by Marshall (Canon) on Jul 17, 2009 at 16:37 UTC
I've heard about this problem happening. I think tweetiepooh could be on to something important here. I was reading more about Berkley DB here: http://www.oracle.com/technology/documentation/berkeley-db/db/gsg_txn/C/index.html There is a lot of bookkeeping to keep track of a transaction and if you are in a situation where say 2 hours of inserts are one transaction which in theory could be aborted with no change to the DB, there's a lot of overhead there! A commit would say, "I'm finished with this one". I am not a DB guru. But I'm also wondering if there aren't some options that circumvent some of the normal transaction rollback and journaling for the case of a single user doing the initial DB create from scratch? I don't know. Just wondering if this initial build is somehow handled differently than the "online use" of thing thing once built? Update: I would leave ouput unbuffered until you get this working. But you should be aware that there is a significant performance penalty for that. In this case, we could be talking hours of difference! Get it working, then turn buffering back on and see what happens. Right now I am suspecting that tweetiepooh's idea of committing every 100 or whatever adds is gonna do something impressive.	[reply]
Re^2: scripts stops running .. almost by mzedeler (Pilgrim) on Jul 17, 2009 at 21:23 UTC
The original Berkeley DB didn't support transactions. Even though the newer versions does support transactions, it doesn't seem that DB_File uses supports it. If it works as I used to know it, every change is written straight to the file as it is done, but you can't use the file size as a safe measurement of every write. A different way to speed up the load is randomizing the order of the keys (or a pseudo random map of the keys themselves, such as MD5). I know it sounds odd, but if you are using B-tree storage and the keys are sorted, you get very long load times because the tree is constantly being rebalanced. My suggestion with regard to trying hash storage still stands. Try that first.	[reply]
Re^3: scripts stops running .. almost by Anonymous Monk on Jul 18, 2009 at 03:43 UTC
BerkeleyDB supports transactions	[reply]
Re^4: scripts stops running .. almost by mzedeler (Pilgrim) on Jul 18, 2009 at 08:59 UTC
Re^5: scripts stops running .. almost by Anonymous Monk on Jul 18, 2009 at 09:56 UTC