Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Perl forked processes and variable sharing

by fireblood (Scribe)
on Feb 01, 2022 at 21:39 UTC ( #11141024=perlquestion: print w/replies, xml ) Need Help??

fireblood has asked for the wisdom of the Perl Monks concerning the following question:

Dear wise ones,

I am working on a program that will execute a forked child that will return information back to the parent process. I know that when a fork is executed, the parent and child processes initially have identical copies of the pre-forked environment but that these copies are independent of each other, such that changes made to any part of its environment by either process are not reflected in the other environment.

I decided to see if I could come up with a method whereby a child could in fact modify the value of a variable in the parent. It involves encapsulating the child code into a subroutine to which a reference to a variable in the parent process is passed as an argument. The child modifies the value of the parent variable by accessing that variable via its address, not via its name.

My code is as follows:

use strict;
use warnings;

my $string = "First value";
my $ref_to_string = \$string;

unless (fork)

    {
        do_subroutine ($ref_to_string);
        exit;
    }

print "\nI am the invoking process and my PID is $$\n";

print "invoker $$: The value of \$ref_to_string is $ref_to_string\n";
print "invoker $$: The value of \$\$ref_to_string is $$ref_to_string\n\n";

my $waited_upon_child = wait;

print "invoker $$: The PID of the child upon whom I waited was $waited_upon_child\n\n";

print "invoker $$: The value of \$ref_to_string is $ref_to_string\n";
print "invoker $$: After the call to the subroutine, the value of \$\$ref_to_string is finally $$ref_to_string\n\n";

print "invoker $$: I will now invoke the subroutine directly.\n\n";

do_subroutine ($ref_to_string);

print "invoker $$: The value of \$ref_to_string is $ref_to_string\n";
print "invoker $$: After the call to the subroutine, the value of \$\$ref_to_string is finally $$ref_to_string\n\n";

sub do_subroutine

    {
        print "I am the subroutine and my PID is $$\n";

        my $sub_ref_to_string = shift;

        print "Sub $$: The value of \$sub_ref_to_string is $sub_ref_to_string\n";
        print "Sub $$: The value of \$\$sub_ref_to_string is $$sub_ref_to_string\n\n";
        print "Sub $$: The preceding two messages confirm that I have access to the same\n";
        print "Sub $$: memory location as the calling process even though I may be running\n";
        print "Sub $$: in a different process from that of the calling process.\n\n";

        #   The following sleep instruction simulates time expended if this subroutine
        #   were doing real work

        sleep (4);

        $$sub_ref_to_string = "Second value";

        print "Sub $$: The value of \$sub_ref_to_string is still $sub_ref_to_string\n";
        print "Sub $$: The value of \$\$sub_ref_to_string is now $$sub_ref_to_string\n";

        print "Sub $$: I have changed the value of the scalar referenced by \$\$sub_ref_to_string (Heh-heh)\n\n";
    }



The output from this program is the following:

I am the invoking process and my PID is 95864
invoker 95864: The value of $ref_to_string is SCALAR(0xe96dc0)
invoker 95864: The value of $$ref_to_string is First value

I am the subroutine and my PID is 95876
Sub 95876: The value of $sub_ref_to_string is SCALAR(0xe96dc0)
Sub 95876: The value of $$sub_ref_to_string is First value

Sub 95876: The preceding two messages confirm that I have access to the same
Sub 95876: memory location as the calling process even though I may be running
Sub 95876: in a different process from that of the calling process.

Sub 95876: The value of $sub_ref_to_string is still SCALAR(0xe96dc0)
Sub 95876: The value of $$sub_ref_to_string is now Second value
Sub 95876: I have changed the value of the scalar referenced by $$sub_ref_to_string (Heh-heh)

invoker 95864: The PID of the child upon whom I waited was 95876

invoker 95864: The value of $ref_to_string is SCALAR(0xe96dc0)
invoker 95864: After the call to the subroutine, the value of $$ref_to_string is finally First value

invoker 95864: I will now invoke the subroutine directly.

I am the subroutine and my PID is 95864
Sub 95864: The value of $sub_ref_to_string is SCALAR(0xe96dc0)
Sub 95864: The value of $$sub_ref_to_string is First value

Sub 95864: The preceding two messages confirm that I have access to the same
Sub 95864: memory location as the calling process even though I may be running
Sub 95864: in a different process from that of the calling process.

Sub 95864: The value of $sub_ref_to_string is still SCALAR(0xe96dc0)
Sub 95864: The value of $$sub_ref_to_string is now Second value
Sub 95864: I have changed the value of the scalar referenced by $$sub_ref_to_string (Heh-heh)

invoker 95864: The value of $ref_to_string is SCALAR(0xe96dc0)
invoker 95864: After the call to the subroutine, the value of $$ref_to_string is finally Second value


The dilemma that I have is that it is clear that the child process knows the address of the variable in the parent space that it is to try to change, because upon invocation the child correctly reads and prints the value of the variable at that address. The child then modifies the value of the variable at that address and confirms within itself that the change has correctly taken place, again accessing the value at that address via the reference to that variable.

When the child has finished its work, and the parent has finished waiting on the child, the parent then examines the value of the variable the address of which it had passed to the child. It turns out that the value is unchanged, as if the child had never run. Yet repeatedly across the code for the parent and the child, confirmation is printed that the value of the reference to the variable in question never changes.

My program then goes on to have the parent invoke the child subroutine directly, not by delegating it to a child process, and then it is clear that the subroutine works as intended.

Why would the value at the referenced address be modifiable by the subroutine when running in the same process as the invoker, but not when the subroutine is running in a different process but still clearly has full access to the address of the variable to be changed, as is evident by the fact that at its outset it can read the value at that address placed there by the parent?

Thank you.
  • Comment on Perl forked processes and variable sharing

Replies are listed 'Best First'.
Re: Perl forked processes and variable sharing
by dave_the_m (Monsignor) on Feb 01, 2022 at 21:58 UTC
    I didn't look at your code and description in depth, but are you aware that in a virtual memory system, (every major UNIXy OS in the last 25 years), a forked child and its parents have separate memories? i.e. if the parent writes to a byte at address 0x10000, that change will not appear in the child's address 0x10000, since those two (identical) virtual addresses map to different physical addresses.

    The slight exception is Windows, where perl emulates a fork by using two threads sharing the same virtual address space.

    Dave.

      Hi Dave,

      Thanks for your reply. I had considered that perhaps the same address in two processes might be in two entirely different address spaces and therefore not physically the same addresses, so that is why in my code I had included the test where the child process reads the value of the scalar at the address passed to it as an argument by the parent process. What the child process finds at that passed address is exactly the same value that the parent wrote to that address. So it seems that in my program the same two virtual addresses map to the same two physical addresses.

      Perhaps when the child reads from that address the physical addresses are the same, but after the child writes to that address what happens behind the scenes is that that physical address is not overwritten in place, lest the new value from the child be longer than the original value from the parent and neighboring variable values be clobbered, so instead the new value is written by the child to a new physical address which remains unknown to the parent.

      Thanks for your thoughts on this.

      Richard
        I still don't really follow what point you're trying to make, but you don't seem to get getting how forking and virtual memory works.

        Consider a process which uses two int/pointer sized memory locations. One contains a value, and the second contains a pointer to the first:

        virtual value address 0x10000 0x12345678 0x10008 0x10000
        Those are virtual addresses - those are the addresses the process sees, and which are printed out in things like SCALAR(0x10000). The CPU's hardware behind the scenes maps that page of address space for that process to some physical page in RAM. This is all behind the scenes - you never get to see physical addresses. After the fork, the child process gets a *copy* of the parents address space - same virtual address locations, but mapping to a different physical page of memory which contains a copy of the values:
        physical virtual value address address parent 0x2220000 0x10000 0x12345678 0x2220008 0x10008 0x10000 child 0x4440000 0x10000 0x12345678 0x4440008 0x10008 0x10000
        After the child modifies the value, you get this:
        parent 0x2220000 0x10000 0x12345678 0x2220008 0x10008 0x10000 child 0x4440000 0x10000 0xdeadbeef 0x4440008 0x10008 0x10000
        There is absolutely no connection between the parent and child aside from the fact that initially the child's value in its memory is a copy of the parent's. If you think your code shows shows something else, please say which exact line of output shows this, and what you think its output should be.

        Dave.

        your code is hard to read for me, part of the problem is that you used <pre> instead of <code> tags.

        > What the child process finds at that passed address is exactly the same value that the parent wrote to that address

        when did you write to that address?

        If it's prior to forking, the value was copied.

        If the effect happens after forking, it could be a bug in COW (at least I wouldn't expect this)

        > but after the child writes to that address what happens behind the scenes is that that physical address is not overwritten in place,

        I'd say that's pretty much a description of COW = Copy On Write

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

Re: Perl forked processes and variable sharing
by LanX (Sage) on Feb 01, 2022 at 21:52 UTC
    Pardon me for not looking into your code ...

    ... but AFAIK there shouldn't be any way that a process can change data inside another process.

    And that is what fork is doing, creating different processes!

    (however in a very efficient way by duplicating code with copy-on-write tricks, IIRC)

    The "identical reference" you are seeing is IMHO just an identical copy of the reference, NOT the same memory address (NB: refs are not memory addresses! )

    Why don't you use proper perlipc or threads with shared data instead?

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      Hi Rolf,

      Thanks for the feedback, and especially for your last line suggesting using proper perlipc or threads with shared data. I was aware of threads with shared data, but had not read Tom Christiansen's amazing treatise on Perl IPC until reading your reply. It's definitely got me going in a new direction that I think will work properly for me.

      Cheers, Richard
Re: Perl forked processes and variable sharing
by marioroy (Parson) on Feb 07, 2022 at 02:38 UTC

    Hi fireblood,

    First, a kind correction to perlfan's comment about MCE. All the methods in MCE and MCE::Shared that pertain to communication are IPC-based. That has been the case since version 1.0. There are examples out there where workers write data to /dev/shm or /tmp and pass the path via IPC to the manager process i.e. data being quite large.

    I made a tiny change to your demonstration. MCE::Shared::Server handles requests transparently to and from the shared-manager process, where the data resides. This works as one would expect it to.

    $ diff ex1.pl ex2.pl 2a3 > use MCE::Shared; 4c5 < my $string = "First value"; --- > tie my $string, "MCE::Shared", "First value";

    That's a tiny glimpse of what MCE::Shared is capable.

    OOP Usage

    If you're having to fetch or store several thousands or millions of times, then OOP usage involves lesser overhead.

    use strict; use warnings; use MCE::Shared; my $string = MCE::Shared->scalar("First value"); sub do_subroutine { print "I am the subroutine and my PID is $$\n"; $string->set("Second value"); } print "I am the invoking process and my PID is $$\n"; print "The value of \$string is ", $string->get(), "\n"; unless (fork) { do_subroutine (); exit; } my $waited_pid = wait; print "The PID of the child upon whom I waited was $waited_pid\n"; print "The value of \$string is ", $string->get(), "\n"; __END__ I am the invoking process and my PID is 22725 The value of $string is First value I am the subroutine and my PID is 22727 The PID of the child upon whom I waited was 22727 The value of $string is Second value

    String Class demo 1

    One is not forced to using various data modules that ship with MCE::Shared. Next is a demonstration using a custom string class. A shared object is created afterwards.

    use strict; use warnings; use MCE::Shared; package My::String; sub new { my ($class, $string) = @_; return bless \$string, $class; } sub set { my ($self) = @_; $$self = $_[1]; return 1; } sub get { my ($self) = @_; return $$self; } package main; my $string = MCE::Shared->share( { module => "My::String" }, "First value" ); sub do_subroutine { print "I am the subroutine and my PID is $$\n"; $string->set("Second value"); } print "I am the invoking process and my PID is $$\n"; print "The value of \$string is ", $string->get(), "\n"; unless (fork) { do_subroutine (); exit; } my $waited_pid = wait; print "The PID of the child upon whom I waited was $waited_pid\n"; print "The value of \$string is ", $string->get(), "\n"; __END__ I am the invoking process and my PID is 22978 The value of $string is First value I am the subroutine and my PID is 22980 The PID of the child upon whom I waited was 22980 The value of $string is Second value

    String Class demo 2

    Here sharing is instantiated inside the string class.

    use strict; use warnings; use MCE::Shared; package My::String; sub new { my ($class) = @_; my $string = MCE::Shared->scalar($_[1]); return bless [ $string ], $class; } sub set { my ($self) = @_; $self->[0]->set($_[1]); return 1; } sub get { my ($self) = @_; return $self->[0]->get(); } package main; my $string = My::String->new("First value"); sub do_subroutine { print "I am the subroutine and my PID is $$\n"; $string->set("Second value"); } print "I am the invoking process and my PID is $$\n"; print "The value of \$string is ", $string->get(), "\n"; unless (fork) { do_subroutine (); exit; } my $waited_pid = wait; print "The PID of the child upon whom I waited was $waited_pid\n"; print "The value of \$string is ", $string->get(), "\n"; __END__ I am the invoking process and my PID is 23089 The value of $string is First value I am the subroutine and my PID is 23091 The PID of the child upon whom I waited was 23091 The value of $string is Second value

      I met to add that MCE::Hobo and MCE::Shared provide capabilities similar to threads and threads::shared respectively. One isn't required to use MCE workers. In fact, MCE::Shared works also with threads, Parallel::ForkManager, and workers spawned with fork.

      If using non-MCE workers, then call MCE::Shared->init inside the child. That will spread IPC communication across 10 data channels. The channel assignment is done automatically for threads and MCE workers (i.e. MCE, MCE::Child, and MCE::Hobo).

      unless (fork) { # This assigns the worker 1 of 10 data channels in a # round-robin fashion for the life of the worker. MCE::Shared->init(); do_subroutine (); exit; }

      Update: Added demonstrations.

      Below, please find two demonstrations which one may run for comparison. Parallel data channels are helpful especially when a reply is not needed i.e. incrementing a value or writing to a shared output object. Each worker increments the value 100,000 times.

      use strict; use warnings; use MCE::Shared; use Time::HiRes 'time'; my $number = MCE::Shared->scalar(0); sub do_subroutine { $number->incr for 1..1e5; } my @pids; my $start = time; for (1..10) { my $pid; unless ($pid = fork) { do_subroutine (); exit; } push @pids, $pid if defined $pid; } waitpid $_, 0 for @pids; printf "number %d\n", $number->get; printf "seconds %.03f\n", time - $start; __END__ number 1000000 seconds 5.234

      That's quite fast considering incrementing the value involves IPC. Faster is possible if you want non-MCE workers to reach levels capable of threads and MCE workers which call MCE::Shared->init automatically.

      MCE::Shared->init

      Calling MCE::Shared->init is beneficial for non-MCE workers.
      use strict; use warnings; use MCE::Shared; use Time::HiRes 'time'; my $number = MCE::Shared->scalar(0); sub do_subroutine { $number->incr for 1..1e5; } my @pids; my $start = time; for (1..10) { my $pid; unless ($pid = fork) { MCE::Shared->init; # enables multi-channel do_subroutine (); exit; } push @pids, $pid if defined $pid; } waitpid $_, 0 for @pids; printf "number %d\n", $number->get; printf "seconds %.03f\n", time - $start; __END__ number 1000000 seconds 2.086

      That is shy of 500k per second on Clear Linux.

      threads demonstration

      For completeness adding threads and MCE::Hobo demonstrations. They benefit from multi-channel IPC automatically and not necessary to call MCE::Shared->init.

      use strict; use warnings; use threads; use MCE::Shared; use Time::HiRes 'time'; my $number = MCE::Shared->scalar(0); my $start = time; sub do_subroutine { $number->incr for 1..1e5; } threads->create('do_subroutine') for 1..10; $_->join for threads->list; printf "number %d\n", $number->get; printf "seconds %.03f\n", time - $start; __END__ number 1000000 seconds 2.142

      MCE::Hobo demonstration

      use strict; use warnings; use MCE::Hobo; use MCE::Shared; use Time::HiRes 'time'; my $number = MCE::Shared->scalar(0); my $start = time; sub do_subroutine { $number->incr for 1..1e5; } MCE::Hobo->create('do_subroutine') for 1..10; $_->join for MCE::Hobo->list; # same as MCE::Hobo->wait_all printf "number %d\n", $number->get; printf "seconds %.03f\n", time - $start; __END__ number 1000000 seconds 2.087
Re: Perl forked processes and variable sharing
by BillKSmith (Monsignor) on Feb 02, 2022 at 15:47 UTC
    Your 'problem' has nothing to do with perl. Your OS (and every other one that I know of) is designed to make it impossible for a child to make changes to its parent's environment. If your 'workaround' is truly able to do this, it is exploiting a bug. Do not use it. This design choice is intended to prevent unexpected behavior of a program due to the history of other programs.
    Bill
Re: Perl forked processes and variable sharing
by perlfan (Vicar) on Feb 04, 2022 at 06:40 UTC
    You're looking for a shared memory programming model, Perl doesn't support that. It's ruthlessly single process, save some nice child process management and an interface to fork. If you want true SMP, you're gonna have to deal with esoteric IPC, made a bit easier with modules like IPC::Shareable and related. You've also got file based comms, like what's described in Parallel::ForkManager. And if you want to get fancy, take a look at MCE - but even that just gives you a fancy file-based communication fabric. There are also interfaces to actual shared memory, shared C libraries. Another interesting idea is to use Redis, which is fundamentally a data structure middleware that provides atomic guarantees among competing processes - but it runs as a daemon and it'd have to be treated with a client/server type architecture in your program; which means you have to manage this "helper" process in the environment you're running your code. May also want to take a look at mkfifo.
Re: Perl forked processes and variable sharing
by perlfan (Vicar) on Feb 04, 2022 at 17:47 UTC
    Also, XAN has a couple of interesting modules,
    • Filesys::POSIX - this comes with a complete "in memory" file system capability
    • IPC::Pipeline - Create a shell-like pipeline of many running commands
    There's also a workshop video on Filesys::POSIX.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11141024]
Approved by LanX
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (5)
As of 2022-12-09 13:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?