eraskin has asked for the wisdom of the Perl Monks concerning the following question:
Hello all... brand new seeker here.
I am new to using MCE::Hobo and looking for some wisdom. I have a program that makes a web service call to a SAP DataServices system. The code supplied by SAP is written in Java, so I have written a module using Inline::Java to access it. It works fine single-threaded.
Performance on large files is terrible and I know the bottleneck is the call to the service. I am attempting to using MCE::Hobo using workers and a queue to send requests to the server in parallel. When I execute the task, only some records return with valid data. It appears that only some of the workers are calling the service properly.
I have a sneaking suspicion that the problem is with Inline::Java and the JVM it creates. I thought (mistakenly?) that using MCE::Hobo would create individual processes and each would get its own JVM. Perhaps not? I wrote the software this way because I am not sure that the Java code provided by SAP works in a multi-threaded environment - hence the use of multiple processes.
Does anybody have experience running Inline::Java inside MCE::Hobo tasks?
Please let me know what other information you might need. I will post what I can. The code itself is rather long.
|
---|
Replies are listed 'Best First'. | |
---|---|
Re: Inline::Java with MCE::Hobo
by marto (Cardinal) on May 27, 2021 at 09:26 UTC | |
Sorry, this is perhaps a bit of a tangent from your question, however IIRC SAP DataServices has a REST API which should remove the complication of using a Java program to achieve this. | [reply] |
by eraskin (Novice) on May 27, 2021 at 14:01 UTC | |
Thanks for the reply. A Data Services real time service is accessed by sending messages to an Access Server, which has Job Servers running behind it. Each Job Server is running the same data flow, which takes an XML message in and replies with a resulting XML message. The Access Server automatically load balances the requests to as many Job Servers as you have configured, automatically starting and stopping them as the load requires. IMO, this is much better than just a REST call, which is why I use it. Of course, if I can't get it to work, I can try REST calls as a fallback. Thanks for the suggestion! | [reply] |
Re: Inline::Java with MCE::Hobo
by cavac (Parson) on May 27, 2021 at 08:11 UTC | |
While i'm not familiar with MCE::Hobo (or Inline::Java for that matter), it seems it uses fork() to create new workers. While this works great in general, there are a few caveats that you have to keep in mind and check for. For once, open file handles are shared with child processes. That means you might need to close all file handles and open new ones. Including any temporary files Inline::Java might have created. For example, Inline::Java might create some external temporary files that it can execute with the java engine in the parent process during loading of your module. When the first child process exits, it might clean up (delete) those files, thus preventing other child processes from working correctly. And/or it might share a network socket to the JVM. Not sure how to solve this in Inline::Java, or even if it's solvable in it's current form. There is a thread on stackoverflow discussing a similar problem. One way to work around this would be to have a master process that forks and then calls up a completely new instance of the worker script. Something along the lines of (untested, written from memory)
There is also the funny matter of things like the pseudo-random number generator. When you fork(), all child processes will generate the same sequence of random numbers. Some protocols (and other stuff) use randomly generated unique IDs to, say for example, identify a transaction or a data block. If you launch multiple child processes that now generate the same "unique" IDs, this could lead to all sorts of strange problems. Calling srand() immediately after fork in the child (in whatever function you call from MCE::Hobo) might be a good idea.
perl -e 'use Crypt::Digest::SHA256 qw[sha256_hex]; print substr(sha256_hex("the Answer To Life, The Universe And Everything"), 6, 2), "\n";'
| [reply] [d/l] [select] |
by eraskin (Novice) on May 27, 2021 at 14:33 UTC | |
Thanks. I wish the children were independent enough to do this, but they are really just an internal part of a much bigger program. We are in a business that receives hundreds of files a day, all in different formats. To process them, we convert them to a standard format. I had a tool that I purchased 20 years ago written in COBOL(!). Rather than keeping that up to date, I wrote a perl script to take on the functions we needed. The code is basically just a read-process-write loop. Read the input, parse the data, generate the output format, write the output file, track counts and statistics about the data. Part of "parse the data" is a call to Data Services to parse the input name and generate titles and gender codes. That's where my bottleneck is. I have taken the internals of the "parse the data" and put them into a sub. This sub reads from a queue and processes the data until the queue is empty. Each worker sub writes to an output queue. An output loop reads each record from the queue, generates the output format and writes the file. Here are some relevant parts:
The code that actually calls the service is deep inside the mapdata() function. The relevant piece of code is:
Finally, here is the complete code from the SexCoder object
Sorry for the long post | [reply] [d/l] [select] |
Re: Inline::Java with MCE::Hobo
by Fletch (Bishop) on May 27, 2021 at 14:31 UTC | |
Untried hunch: Haven't diddled with MCE as much as I'd like to have (possibly should have) but if the problem is that your worker children are inheriting too much context, then rather than pulling in Inline::Java in the parent you might try pushing that out into a separate module which you then require in your on_start handler for your MCE::Hobo workers. That way you're going to be sure that an separate JVM instance and all its associated plumbing is coming up only on the child process side of the fork. Edit: Derp, I do need to play with MCE more because apparently (re-?)reading the MCE::Hobo docs the on_start callback is called in the parent after a worker is started. Possibly instead then do the require in the sub you pass to the create method instead and that'll still keep it done only in the child.
The cake is a lie. | [reply] [d/l] [select] |
by eraskin (Novice) on May 27, 2021 at 15:06 UTC | |
I could not make the require work. First, I removed the "use SexCoder" from the top of my script. Then I added this: The error message I got was:
So, the SexCoder did not load. I then tried removing the Hobo->init and placing "use SexCoder" inside my worker sub {}. That was no different than my original code. I had exactly the same issue, where some workers were able to get results from the service and others were not. That really confuses me. I think you are on the right track, though. There must be something odd in my SexCoder package that I'm not seeing. See my reply to cavac above for the complete code to that package. Maybe you can see something? I have one more tidbit for everyone helping me. The code works if $num_workers = 1 (simulating single-threaded code). That's why I suspect an issue with Inline::Java or with the rtsClient itself. I just can't fathom why if each is running in its own process. | [reply] [d/l] [select] |
by Anonymous Monk on Jul 24, 2021 at 05:32 UTC | |
Greetings eraskin, Have you tried importing SexCoder inside the input_task Hobo routine. This is where to load the class uniquely per worker child.
| [reply] [d/l] |
by Anonymous Monk on Jul 24, 2021 at 05:38 UTC | |
Re: Inline::Java with MCE::Hobo
by karlgoethebier (Abbot) on May 27, 2021 at 08:13 UTC | |
«Does anybody have experience running Inline::Java inside MCE::Hobo tasks?» No I don’t. Post some code. In the meantime you may take a look at Discipulus's library which contains many links about the basic theme. Some less or more useful stuff there is from me - with a little help from my friends here 🤪😎 Regards, Karl «The Crux of the Biscuit is the Apostrophe» | [reply] |
by eraskin (Novice) on May 27, 2021 at 14:34 UTC | |
Thanks. Please see my reply to cavac above for code | [reply] |
Re: Inline::Java with MCE::Hobo
by karlgoethebier (Abbot) on May 29, 2021 at 12:35 UTC | |
One more thing: cavac mentioned this 9 years old thread on stackoverflow. From ibidem: "...Another approach is to forget about forking from Perl and to leverage Java's superior threading model to do parallelization. Design your Java code to perform its tasks in new threads, and start new threads from Perl..." Probably not a bad idea. And something i dreamed of and never tried. Regards, Karl «The Crux of the Biscuit is the Apostrophe» | [reply] |