Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re: WHY copying does happen (fork)

by BrowserUk (Patriarch)
on May 05, 2008 at 19:43 UTC ( [id://684737]=note: print w/replies, xml ) Need Help??


in reply to WHY copying does happen (fork)

One of the common causes of this in Perl is if you've read your shared numeric data in from a file.

When you assign that data to the scalars in the array, the data is still stored as strings: '123'.

But then you come to use those values as a part of a numeric expression, and perl converts the string stored in the PV slot of the scalar and assigns the binary numeric value to the IV or NV slot.

Bang! You just made a non-mutating reference to a single shared value and causes a 4k page to be copied. Iterate you're entire array summing the numbers and you'll cause the whole array, plus everything else on each page that contains any of your arrays scalars to be copied also. A prime example of halo slippage on the "threads are spelt f-o-r-k" holy COW.

A practical tip: If your shared arrays are numeric and read from files or a DB, add zero to them as you assign them:

my @sharedArray = map 0+$_, split ' ', $lineOfData;;

Not only will you not cause COW when doing math with them after forking, the array will be smaller to boot. Adding zero forces the conversion of the string you read into a binary numeric before it is assigned to the SV, which means no PV will be allocated and you save space. And as they are already numeric, using them in a numeric context won't have to convert them and so no mutations of the SV and no COW.

Of course, that only holds true until you use them in a string context. If you need to print the out to another file, or the terminal, use printf instead of print and interpolation.

my @a = 1.. 1e6;; ## takes 62 MB. printf "the number is: %d\n", $_ for @a;; ## causes no memory growth print "the number is: $_\n" for @a;; ## Causes the memory to gro +w to 110MB.

Use interpolation on shared data and the growth would be far higher because unless you are very careful in how you populate the original array, the scalars it consists of will occupy space in 4k pages shared with other data, and they'll be copied also.

Do it in 2 or more forked children and they'll all get their own copies. 100MB of shared numeric data and 5 forked children and you can see the total memory requirement blossom to well over 1GB just cos you interpolated the numbers.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: WHY copying does happen (fork)
by bibliophile (Prior) on May 06, 2008 at 13:58 UTC
    Thanks, BrowserUk. An extremely clear explanation of something I had no idea I needed to know... until the explanation.

    :-)

    -Bib

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://684737]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2024-04-25 09:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found