Re^4: Catching errors in closing lexical filehandles

Replies are listed 'Best First'.
Re^5: Catching errors in closing lexical filehandles by tilly (Archbishop) on Sep 27, 2004 at 22:17 UTC
Not to lessen your sense of wonderfulness, but there are problems with both true GC and reference counting. With true GC, variables going out of scope will eventually be freed. Open filehandles will be closed. etc. However when that will happen is another story. This is an issue when you're using some form of scarce resource. For instance the fact that some of your open filehandles will be closed someday does not help you if you've run over the local OS limit for open filehandles right now. Likewise leaking statement handles might make your database very unhappy. These are downsides to true GC. On the other hand reference counting isn't perfect either. First there is the obvious deficiency - circular references never go away. However there is also the hidden problem that it takes a lot of code distributed throughout your codebase to keep reference counts correct - and problems in that code are not that obvious. (Well, not if the bug is to add an extra 1 to the reference count...) This means that using reference counting increases the number of bugs in Perl. The upshot is that with true GC you want to close external resources yourself - because eventually may not come soon enough. With reference counting you have reliable timing, but it is really hard (read next to impossible) to get the implementation right.	[reply]
Re^6: Catching errors in closing lexical filehandles by BrowserUk (Patriarch) on Sep 28, 2004 at 02:19 UTC
I believe there is a third way (And no, I'm not talking about Tony:). This consists of using reference counting and mark'n'sweep. The reference counting deals with the majority of GC, ensuring timely destruction and avoiding the "pregnent pauses" that are another, more insidious signature of M&S. The mark&sweep is only used to break the bonds of circular references. Either invoked automatically at "resources low" time, or by the programmer when he knows he has just finished with data that may include circular refs. That leaves the problem of reference counting code being distributed throughout the codebase. That can be addressed by using real references (Handles). If references aren't "addresses with a flag and some magic", but are instances of a Reference class that overloads the "take a ref" operator and assignment, such that whenever a reference to an object is taken, or a reference is assigned, a Reference object method is invoked to do it, then all the reference counting code can live in one place. Further, by storing the actual addresses behind the Reference handles in a single place (say an array that is class data in the Reference class, and using the index into that array as the Handle), when the time comes that a Mark'n'Sweep is required, the amount of memory that must be scanned to locate live/dead objects is confined to things pointed to by that array, and is therefore very fast. The penalty is that every data access through a reference has an extra level of indirection. That may sound like a high overhead, but in the scheme of things where most references are to (fat) objects rather than individual intrisic types, the overhead can become lost in the greater levels of indirection they already entail. Moore's law means that the costs going forward are ever reducing and only become significant if compared to a non-indirected implementation which will always be faster--until the GC runs. There are other benefits of using a "references are handles not addresses" implementation to do with reentrancy which can really come into their own once threading, continuations and co-routines come into play. Overall, I think that the benefits of "one more level of indirection" could more than outweigh costs, but so far there is no existing implementation that I can find upon which to base substance to this speculation. Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon	[reply] [d/l]


"be consistent"
	PerlMonks