Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Crash report collection package

by jagan_1234 (Sexton)
on Oct 07, 2013 at 17:50 UTC ( [id://1057288]=perlquestion: print w/replies, xml ) Need Help??

jagan_1234 has asked for the wisdom of the Perl Monks concerning the following question:

I am developing a large scale distributed system written in perl with multiple machines performing different data processing tasks. Modules crash from time to time and when they do, I write out their stacktrace to a log file. I later scan the log for crash patterns.

Is there a more reliable way (or even standard practice) of collecting bugs, filing and reporting them? I am looking for perl packages or software that can do the bug collection and processing tasks.

In particular I were to write one myself, what are some of the features a crash collection system should consider? My homegrown bug collector records the buggy input, stacktrace, machine etc. Upon hitting a bug the module restarts on the next input and resumes processing.

Any advice would be much appreciated.

Replies are listed 'Best First'.
Re: Crash report collection package
by ig (Vicar) on Oct 07, 2013 at 18:38 UTC

    There are many systems for monitoring, alerting and event analysis and consolidation: both commercial packages and free software. For a large scale distributed system, I would use such a package rather than writing my own.

    You need to consider how to send a reasonable number of alerts that come as close to identifying the root cause of the fault as practical. For example, thousands of alerts about every component in your large scale distributed system is not what you want when the problem is one core router or switch that has failed. Letting through the significant alerts and suppressing alerts about consequential faults is a complex problem made easier by the better tools.

    You also need to record enough information to be able to find the root cause from your logs.

    The better tools allow you to automate response to select events. For example, restarting servers or services.

    If you are concerned about security, you might also want indelible logs mirrored to one or more secure systems.

Re: Crash report collection package
by kensaigm (Hermit) on Oct 07, 2013 at 18:43 UTC
    In a number of places I have worked we have used Syslog. Like any system there can be issues based on need (reporting can be an issue but it has kept me employed.) However, it is old enough that there are a lot of tools available. Free implementations to use on windows and other platforms as well. There is even Perl modules to help with the implementation. Many of tools that are available have the ability to trend behaviors and facilitate other analysis.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1057288]
Approved by ww
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (6)
As of 2024-04-25 08:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found