How can I optimize my script?

danielbenny has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks, as of my bachelor thesis I'm trying to benchmark some logging solutions (e.g. graylog, elastic stack, splunk). In order to make sure that the logging environment could handle a massive syslog load, I needed to write a script that could simulate enough syslog traffic. Here is my script (benchmark.pl):

#! /usr/bin/perl
use strict;
use warnings;
use Sys::Syslog qw(:standard :macros setlogsock);

die "Usage: $0 <host> <port> <count>\n" unless @ARGV == 3;

my ($host, $port, $count) = @ARGV;
my ($sender, $program) = ("localhost","loggenerator");

setlogsock({ type => "tcp", host => "$host", port => "$port" });
openlog("$sender $program", 'pid,noeol,ndelay');
syslog('info', "This is my $_ test message!" ) for (1 .. $count);
closelog();
[download]

With the following call I'm able to write one million messages to my logging infrastructure --> ./benchmark.pl 127.0.0.1 514 1000000 This takes about 50 seconds (depending on the logging software) which results in a throughput of 20000 messages per second. Other benchmark tools have a higher throughput and a lower CPU consumption. How can I optimize my script?

Comment on How can I optimize my script? Download Code

Replies are listed 'Best First'.
Re: How can I optimize my script? by kennethk (Abbot) on May 04, 2017 at 00:03 UTC
Others have suggested going parallel. I'll address the actual question of optimization. You are running into the classic challenges of taking advantage of a library. If I run the code `time perl -E'say "This is my $_ test message!" for 1 .. 1000000' > jun +k.txt` [download] on my command line, it takes time about half a second. Why does it take orders of magnitude more time for your subroutine call to `syslog` than just a simple disk print? If we review the module source code (here), you can see all sorts of computation that is replicated unnecessarily a million times. I'd highly recommend you use a profiler (I use Devel::NYTProf) to see where time is actually being spent. I would expect that if you copied the `syslog` subroutine out of the module, it would run just fine with a small amount of rehab, such as changing `local $facility = $facility; # may need to change temporarily.` [download] to `local $facility = $Sys::Syslog::facility; # may need to change temp +orarily.` [download] Once it's running, then you'll have the capability to pull as much out of that big loop as possible, guided by the profiler. What you get with a library is ease of use, but you get limited because libraries are written for the general case. And that's why open source is great. Alternatively, you can decide your time is more valuable than the computer's, and just leave it running overnight/all week/until the heat death of the universe. #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.	[reply] [d/l] [select]
Re: How can I optimize my script? by Discipulus (Canon) on May 03, 2017 at 20:07 UTC
Hello danielbenny and welcome to the monastery and to wonderful world of Perl! In effect, as anonymousmonk said, you can try to load your destination syslog using a parallel approach. This if you mean `optimize` as generating even more entries per second. You can give a try to MCE or other Perl parallel implementations to have more call to the syslog in the same moment. L* There are no rules, there are no thumbs.. Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.	[reply] [d/l] [select]
Re^2: How can I optimize my script? by Anonymous Monk on May 03, 2017 at 20:09 UTC
Give this'n a gander https://github.com/bluefeet/Parallel-Foreman	[reply]
Re: How can I optimize my script? by Anonymous Monk on May 03, 2017 at 19:50 UTC
Use multiple sockets?	[reply]
Re^2: How can I optimize my script? by danielbenny (Initiate) on May 03, 2017 at 20:02 UTC
How can I use multiple sockets with the syslog module?	[reply]
Re: How can I optimize my script? by marioroy (Prior) on Jul 10, 2017 at 05:58 UTC
Hi danielbenny. See this post for a fast logger demonstration. I wrote the demonstration after reading your post but held off from posting the solution until after releasing MCE::Shared 1.827. In the sample code, localtime is called once per second, not per each write. The reason is that calling localtime or gmtime repeatedly is expensive. MCE::Shared 1.827 will be available later this month. I'd come back and ping after releasing 1.827. Regards, Mario.	[reply]


Do you know where your variables are?
	PerlMonks