http://qs321.pair.com?node_id=1182803


in reply to Re^8: shared scalar freed early (just queues)
in thread shared scalar freed early

Probably because of how inefficient threads::shared is at copying a lot of separate items between threads. When your threads do almost no work and you pass a lot of data, your threads are making things slower rather than faster. One uses threads because one has significant work to do (and not much data to move around -- unless you can use shared memory structures, which aren't easy to do in Perl). You'd probably get a big speed boost in this case if you used forks (but that won't run on Windows). But it will probably still be slower than if you just added no concurrency for this (unrealistic, I presume) work load.

- tye        

Replies are listed 'Best First'.
Re^10: shared scalar freed early (work)
by chris212 (Scribe) on Feb 28, 2017 at 20:19 UTC
    FYI, the worker threads in my script do a lot of work, but they also require a lot of data. It most definitely is faster using concurrency. I'd rather have a dozen cores being utilized than one. I don't think forks wouldn't allow me to write output to the same file in sequence without some kind of IPC which would probably slow things down, and sorting afterward would take too long.
      I don't think forks wouldn't allow me to write output to the same file in sequence without some kind of IPC

      I guess you didn't bother to follow the link or didn't bother to understand the material presented at that link.

      without some kind of IPC which would probably slow things down

      So you ignored or don't believe the point I just stated. If you bothered to read and understand what I linked to, then you would not have clung to this assumption.

      the worker threads in my script do a lot of work, but they also require a lot of data

      Then you will get better performance if you do unusual work to arrange for that data to be made available more efficiently than it can be by the easy things like threads::shared. Copying data from a parent process to child processes will be significantly faster using vanilla fork than using forks (which is significantly faster than threads::shared).

      Since you said "IPC [...] would probably slow things down", you haven't even tried that. Frankly, that is what I would try first (probably using an approach similar to MCE, though I have my own, simpler implementation of that type of approach).

      If that method of communicating data is too slow, then you probably want to do more work to communicate the data using shared memory. That can be done from parent to child by just storing the data in a contiguous block of memory where the children can read it without having to copy-on-write pages of memory (as happens with read-only access to Perl data structures). Going the other direction is similar but harder.

      I'd rather have a dozen cores being utilized than one.

      And I'd think you'd rather have those cores getting more real work done than having them spend time doing expensive operations like making tons of new Perl "threads".

      the worker threads in my script do a lot of work

      Then optimizing how the real code works should probably not be based on comparisons in performance of approaches that are benchmarked using threads that are doing trivial amounts of work.

      - tye