comment on

If the requirement is continuous, you'll need some kind of daemon (perhaps invoked at system startup) to pick up new appendages to the logfiles shortly after they arrive. Lets also assume that messages have a timestamp, otherwise duplicate events separated only in time would be indistinguishable.

To allow for reboot of the system, the daemon will need to keep track of the timestamp of the last message it collated for each machine writing messages to the logfiles (in case their clocks are out of synch.)

There also needs to be a structure of regular expressions that enables not just identification of the originating process but of the timestamp which needs to be converted into a delta time for comparison. In a dynamic environment this might best be achieved using a csv configuration file e.g.:

PROCESS,HOST,LOGFILE,FORMAT,$1,$2
foo,host99,/var/adm/foo.log,\s+\S+\s+(\S+)\s+(\d+\-\d+\-\d+\s\d+:\d+:\
+d+:\s\w{2}),PROC,TIMESTAMP
[download]

Once all that is sorted out there still remains the routine work for the daemon of reading in the config file, reading in the timestamp tracker file (one line per host), for each file (only one filehandle needed!) matching lines of logfiles against the configured regexps and ignoring entries prior to the timestamp for the host, updating the per-process file and the journal file with the latest timestamp (plus originating host) of a message just transferred to the per-process file.

It also needs to sleep perhaps five minutes between cycles through all the log files to free system resources for other processes.

Update: a common practice is also to routinely archive and delete logfiles (yet another logfile management daemon!) so that such reprocessing doesn't have to start from the beginning of a very large logfile, and then have to read but ignore millions of entries occurring before the last recorded timestamp. One system I work with regularly archives logfiles when they hit 5 MB instead of by time or line count. It might be convenient for your requirement if the message-collating daemon could also (per cycle) check the size and conditionally do or invoke that archiving itself.

-M

Free your mind

In reply to Re: Untangling Log Files by Moron
in thread Untangling Log Files by loris

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Just another Perl shrine
	PerlMonks