Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re^2: Looking for alternative for IPC::Shareable (or increase size)

by DomX (Novice)
on Aug 06, 2020 at 11:41 UTC ( [id://11120417]=note: print w/replies, xml ) Need Help??


in reply to Re: Looking for alternative for IPC::Shareable (or increase size)
in thread Looking for alternative for IPC::Shareable (or increase size)

So, I tested Storable and my result is unsatisfactory! :'-(
It also stops at 2^16 characters. And more: It doesn't even tell you about the lost data as IPC::Shareable does...
If I'm going to split it, I don't need another module. Anyway thank you very much! Going to use special solution for special case: via database.
  • Comment on Re^2: Looking for alternative for IPC::Shareable (or increase size)

Replies are listed 'Best First'.
Re^3: Looking for alternative for IPC::Shareable (or increase size)
by jcb (Parson) on Aug 07, 2020 at 02:07 UTC

    That is very strange and I have not had that problem.

    This sounds like you are not properly handling the stored objects. I use a single status pipe, where a child reports some identifier and an indication that a binary object is available, and a "return" pipe for each child, where the Storable data is actually written. The status pipe is non-blocking and monitored with Tk's fileevent mechanism, while the "return" pipes are used in blocking mode. I switched to this approach after trying to pass the Storable data back on the status pipe and finding that non-blocking mode caused problems, not to mention that the status pipe is shared, so the messages returned on it must be kept short enough to fit in the OS buffers to avoid messages from different children being interleaved.

      Hey jcb!

      Bit more details about my structure, and where the problem is:

      - Perl/Tk with Tk::HList showing SQLite database view
      - IPC::Shareable object "transport" for communication with "worker" child
      - fork "worker" (forked before Tk objects become created) communicating with "transport" to Tk.
      - "worker" creates its own IPC::Shareable "ac_trans" for its agents
      -> On Tk's request (Button) the "worker" forks its agents which start doing the actual work.
      - Agents download small bits of data (usually far away from 2^16 characters) and transfer it back to "worker" via "ac_trans".
      - "worker" empties array within "ac_trans" and saves data to a regular array
      - "worker" does this until all agents are exited
      - "worker" fills database with all received data at once (faster than every agent accessing database for one small bit of data)
      - "worker" informs Tk through "transfer" that work is finished.


      Though: one source can't be predicted how many data will be downloaded and here it exceeds the 2^16 characters, IPC::Shareable can't handle, as well as Storeable can't (the way I tested it.).
      So far I already have a database in use, I just create a table and use this for this only case. I think the more wait is worth it on this part. (SQLite supports cells with more than 2 billion characters.) Furthermore the changes are acceptable effort here.

        Greetings, DomX

        I tried Storable and Sereal followed by a parallel demonstration. Testing was done on macOS. To capture the memory consumption (i.e. top -o CPU on macOS), uncomment the busy loop line. Sereal is not only faster but consumes lesser memory consumption.

        Storable (updated):

        use strict; use warnings; use feature qw(say); use Storable qw(freeze thaw); my $data = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234 +56789-_'; # $data .= $data for 1..10; # 2^16 65536 # $data .= $data for 1..11; # 2^17 131072 # $data .= $data for 1..12; # 2^18 262144 # $data .= $data for 1..13; # 2^19 524288 # $data .= $data for 1..14; # 2^20 1048576 # $data .= $data for 1..15; # 2^21 2097152 # $data .= $data for 1..16; # 2^22 4194304 # $data .= $data for 1..17; # 2^23 8388608 # $data .= $data for 1..18; # 2^24 16777216 # $data .= $data for 1..19; # 2^25 33554432 # $data .= $data for 1..20; # 2^26 67108864 # $data .= $data for 1..21; # 2^27 134217728 # $data .= $data for 1..22; # 2^28 268435456 # $data .= $data for 1..23; # 2^29 536870912 $data .= $data for 1..24; # 2^30 1073741824 say 'data : '.length($data); my $frozen = freeze(\$data); my $thawed = thaw($frozen); say 'frozen : '.length($frozen); say 'thawed : '.length($$thawed); # simulate busy loop: 4102 megabytes in top; 2.106 seconds # 1 for 1..400_000_000; __END__ data : 1073741824 frozen : 1073741844 thawed : 1073741824

        Sereal (updated):

        use strict; use warnings; use feature qw(say); use Sereal::Encoder qw(encode_sereal); use Sereal::Decoder qw(decode_sereal); my $data = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234 +56789-_'; # $data .= $data for 1..10; # 2^16 65536 # $data .= $data for 1..11; # 2^17 131072 # $data .= $data for 1..12; # 2^18 262144 # $data .= $data for 1..13; # 2^19 524288 # $data .= $data for 1..14; # 2^20 1048576 # $data .= $data for 1..15; # 2^21 2097152 # $data .= $data for 1..16; # 2^22 4194304 # $data .= $data for 1..17; # 2^23 8388608 # $data .= $data for 1..18; # 2^24 16777216 # $data .= $data for 1..19; # 2^25 33554432 # $data .= $data for 1..20; # 2^26 67108864 # $data .= $data for 1..21; # 2^27 134217728 # $data .= $data for 1..22; # 2^28 268435456 # $data .= $data for 1..23; # 2^29 536870912 $data .= $data for 1..24; # 2^30 1073741824 say 'data : '.length($data); my $frozen = encode_sereal(\$data); my $thawed = decode_sereal($frozen); say 'frozen : '.length($frozen); say 'thawed : '.length($$thawed); # simulate busy loop: 3078 megabytes in top; 1.549 seconds # 1 for 1..400_000_000; __END__ data : 1073741824 frozen : 1073741837 thawed : 1073741824

        Sereal with compression enabled:

        use strict; use warnings; use feature qw(say); use Sereal::Encoder qw(encode_sereal); use Sereal::Decoder qw(decode_sereal); my $data = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234 +56789-_'; # $data .= $data for 1..10; # 2^16 65536 # $data .= $data for 1..11; # 2^17 131072 # $data .= $data for 1..12; # 2^18 262144 # $data .= $data for 1..13; # 2^19 524288 # $data .= $data for 1..14; # 2^20 1048576 # $data .= $data for 1..15; # 2^21 2097152 # $data .= $data for 1..16; # 2^22 4194304 # $data .= $data for 1..17; # 2^23 8388608 # $data .= $data for 1..18; # 2^24 16777216 # $data .= $data for 1..19; # 2^25 33554432 # $data .= $data for 1..20; # 2^26 67108864 # $data .= $data for 1..21; # 2^27 134217728 # $data .= $data for 1..22; # 2^28 268435456 # $data .= $data for 1..23; # 2^29 536870912 $data .= $data for 1..24; # 2^30 1073741824 say 'data : '.length($data); my $frozen = encode_sereal(\$data, { compress => 1 }); my $thawed = decode_sereal($frozen); say 'frozen : '.length($frozen); say 'thawed : '.length($$thawed); # simulate busy loop: 2104 megabytes in top; 2.170 seconds # 1 for 1..400_000_000; __END__ data : 1073741824 frozen : 52428830 thawed : 1073741824

        MCE::Channel Demonstration:

        MCE::Channel provides two-way communication and uses Sereal if available, otherwise defaults to Storable. For this demonstration, agents send data to the parent process via send2. Likewise, the parent receives data via recv2.

        use strict; use warnings; use MCE::Child; use MCE::Channel; my $chnl = MCE::Channel->new(); sub agent_task { my ($id, @args) = @_; my $data = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123 +456789-_'; $data .= $data for 1..24; # 2^30 1073741824 # agent >> parent (via send2) $chnl->send2({ id => $id, data => $data }); } my %procs; MCE::Child->init( void_context => 1, posix_exit => 1 ); $procs{$_} = MCE::Child->create('agent_task', $_, 'arg1', 'argN') for +1..2; while (keys %procs) { # parent << agent (via recv2) my $ret = $chnl->recv2; ( delete $procs{ $ret->{id} } )->join; printf "Agent %d: %d\n", $ret->{id}, length $ret->{data}; } __END__ Agent 1: 1073741824 Agent 2: 1073741824

        I have not encountered any limitations with regards to serialization (> 1 billion chars).

        Regards, Mario

        This is a multiprocess application with an SQLite database. Have you considered moving up to PostgreSQL? The agent processes could insert their records into the import table and issue NOTIFY to report completion to the main worker process, which has issued an appropriate LISTEN and uses the pg_notifies method to receive the notifications. (See DBD::Pg for details.)

        As an extra benefit, PostgreSQL LISTEN/NOTIFY is integrated with the transaction system: a notification sent during a transaction is held in the database server until the transaction commits.

        The catch is that you will need to be careful with your DBI handles: each process needs its own database connection and the "agent" processes must not use the "worker"'s DBI handle by mistake.

Re^3: Looking for alternative for IPC::Shareable (or increase size)
by bliako (Monsignor) on Aug 06, 2020 at 12:10 UTC

    perhaps have a look at Sys::Mmap and File::Map before using a disk-based DB just because it has locking mechanism.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11120417]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (4)
As of 2024-04-25 17:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found