Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

How can I optimize my script?

by danielbenny (Initiate)
on May 03, 2017 at 19:45 UTC ( [id://1189438]=perlquestion: print w/replies, xml ) Need Help??

danielbenny has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks, as of my bachelor thesis I'm trying to benchmark some logging solutions (e.g. graylog, elastic stack, splunk). In order to make sure that the logging environment could handle a massive syslog load, I needed to write a script that could simulate enough syslog traffic. Here is my script (benchmark.pl):
#! /usr/bin/perl use strict; use warnings; use Sys::Syslog qw(:standard :macros setlogsock); die "Usage: $0 <host> <port> <count>\n" unless @ARGV == 3; my ($host, $port, $count) = @ARGV; my ($sender, $program) = ("localhost","loggenerator"); setlogsock({ type => "tcp", host => "$host", port => "$port" }); openlog("$sender $program", 'pid,noeol,ndelay'); syslog('info', "This is my $_ test message!" ) for (1 .. $count); closelog();
With the following call I'm able to write one million messages to my logging infrastructure --> ./benchmark.pl 127.0.0.1 514 1000000 This takes about 50 seconds (depending on the logging software) which results in a throughput of 20000 messages per second. Other benchmark tools have a higher throughput and a lower CPU consumption. How can I optimize my script?

Replies are listed 'Best First'.
Re: How can I optimize my script?
by kennethk (Abbot) on May 04, 2017 at 00:03 UTC
    Others have suggested going parallel. I'll address the actual question of optimization.

    You are running into the classic challenges of taking advantage of a library. If I run the code

    time perl -E'say "This is my $_ test message!" for 1 .. 1000000' > jun +k.txt
    on my command line, it takes time about half a second. Why does it take orders of magnitude more time for your subroutine call to syslog than just a simple disk print? If we review the module source code (here), you can see all sorts of computation that is replicated unnecessarily a million times. I'd highly recommend you use a profiler (I use Devel::NYTProf) to see where time is actually being spent. I would expect that if you copied the syslog subroutine out of the module, it would run just fine with a small amount of rehab, such as changing
    local $facility = $facility; # may need to change temporarily.
    to
    local $facility = $Sys::Syslog::facility; # may need to change temp +orarily.
    Once it's running, then you'll have the capability to pull as much out of that big loop as possible, guided by the profiler. What you get with a library is ease of use, but you get limited because libraries are written for the general case. And that's why open source is great.

    Alternatively, you can decide your time is more valuable than the computer's, and just leave it running overnight/all week/until the heat death of the universe.


    #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Re: How can I optimize my script?
by Discipulus (Canon) on May 03, 2017 at 20:07 UTC
    Hello danielbenny and welcome to the monastery and to wonderful world of Perl!

    In effect, as anonymousmonk said, you can try to load your destination syslog using a parallel approach. This if you mean optimize as generating even more entries per second.

    You can give a try to MCE or other Perl parallel implementations to have more call to the syslog in the same moment.

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: How can I optimize my script?
by Anonymous Monk on May 03, 2017 at 19:50 UTC

    Use multiple sockets?

      How can I use multiple sockets with the syslog module?
Re: How can I optimize my script?
by marioroy (Prior) on Jul 10, 2017 at 05:58 UTC

    Hi danielbenny.

    See this post for a fast logger demonstration. I wrote the demonstration after reading your post but held off from posting the solution until after releasing MCE::Shared 1.827. In the sample code, localtime is called once per second, not per each write. The reason is that calling localtime or gmtime repeatedly is expensive.

    MCE::Shared 1.827 will be available later this month. I'd come back and ping after releasing 1.827.

    Regards, Mario.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1189438]
Approved by Discipulus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (3)
As of 2024-04-24 16:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found