Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: No garbage collection for my-variables

by Joost (Canon)
on Sep 15, 2008 at 20:13 UTC ( [id://711533]=note: print w/replies, xml ) Need Help??


in reply to No garbage collection for my-variables

A program has only a limited number of lexical variables, but may process an unlimited amount data.

It's the case anyway that for large strings (which is the only case we need to consider) it's much more efficient to pass around references. And code that expects to deal with very long strings generally does that, or encapsulates the strings in an object or deals with file handles directly.

Copying 500Mb strings around would be stupid not just for memory reasons even if all the memory gets reclaimed when the variables holding them go out of scope. You really do want to pay attention to what you're doing when dealing with large chunks of memory.

Perl is optimizing here for the cases where you want fast, repeated processing of strings no larger than say 10% of your memory. If you need to process larger strings, you'll have to pay attention anyway, and automatically clearing all scalars won't really help much (and it would dramatically slow down the general case).

I don't see the current behaviour changing until someone completes a perl with a garbage collector instead of the current refcounting scheme. That would be perl 6, so it may take a while.

update: I just wanted to mention that although all of this is interesting in a way, it's very unlikely that this behaviour has given you any actual problems. Just don't slurp in giant files, or Encode a whole dictionary in one call. What's wrong with reading and writing stuff line by line? That way, you can run thousands of those programs at once without any problem (or a couple at once, so as to actually use your CPU for something useful, instead of waiting for the drive to catch up).

Replies are listed 'Best First'.
Re^2: No garbage collection for my-variables
by kyle (Abbot) on Sep 15, 2008 at 20:47 UTC

    I don't see the current behaviour changing until someone completes a perl with a garbage collector instead of the current refcounting scheme.

    The OP is saying that you can allocate a large string, let the variable go out of scope, and the memory is not freed and not reused. The memory allocated to the variable "sticks" to it even if you never use it again. (If I have this wrong, betterworld, please correct me.)

    I don't see what garbage collection has to do with this. The strings in question don't have any references to them, so the reference counter shouldn't have any problem knowing that they're not in use.

    I don't know what method perl uses to grow strings. The general method I recall from my CS classes was to double the size of a string when it grows out of its buffer and halve it when it shrinks to less than a quarter of the buffer size. Maybe someone more familiar with the internals can shed some light on why that wouldn't be a good design choice for Perl.

      I don't see what garbage collection has to do with this. The strings in question don't have any references to them, so the reference counter shouldn't have any problem knowing that they're not in use.
      Reference counting has everything to do with it, since it means that the only time perl can free the memory is when the last reference to the scalar goes out of scope. All without knowing if that scalar is every going to be reused.

      That means it either has to keep it there always, or free it always (or do some kind of heuristic, which should usually mean keep it, since allocating memory is expensive, and if you're using a large string now, chances are, you'll be using a large string again some time soon).

      What perl currently cannot do, is free "old, unused" scalars when it's running out of memory. It has to decide when the scalar is going out of scope. allocating and freeing each scalar every time that happens would probably slow down the interpreter a lot.

        That means it either has to keep it there always, or free it always (or do some kind of heuristic, which should usually mean keep it, since allocating memory is expensive, and if you're using a large string now, chances are, you'll be using a large string again some time soon).

        Especially for a large string I wonder in how far the gain of avoiding the deallocation is considerable with regard to filling the buffer with the string and working with it.

        Of course you're right: I shouldn't copy 500MB strings around too much, however I chose such a drastic length to make the effect clear. Even if the strings were smaller, I'd say I could use 1MB of memory for better things than storing 32 long-forgotten scalars of 32kB each (or even smaller ones). I am not really good at making up realistic scenarios, but I'm interested to know: Would you have anticipated perl's behaviour if you had just seen my code samples above?

        I'm glad that kyle seems to agree that it would be nice if perl dealt better with these unused scalars. Besides, it obviously doesn't reuse the buffer if a subroutine calls itself recursively... but I haven't tested the memory consumption for this case yet.

        Oh, I think I see what you're saying now. When the last reference goes out of scope is the only time it gets to make a decision about whether to deallocate the memory used for the variable. We don't necessarily know then whether the variable will be used again or not, or what for, so it's not a very good time to make that decision.

        I'm not sure I'm convinced that deallocating would be a bad thing. Obviously, it depends on what the program is doing. I'd be tempted to take some kind of heuristic approach, but even then I'd want to do some testing to find where the cost/benefits are.

      The strings in question don't have any references to them

      Not true. The pad that refers to them when the function is being executed still refers to them when the function isn't being executed.

      It could be changed to be true, so this nit pick is not relevant to the conversation.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://711533]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (6)
As of 2024-03-28 11:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found