http://qs321.pair.com?node_id=509072

fizbin has asked for the wisdom of the Perl Monks concerning the following question:

I'm seeking design advice both general and specific for the following system. If you really want to say "Oh, Lord, no!" to the whole idea, I suppose that's okay too, but I'd like to know the reasons.

Here's the deal: my employer is in the business of shipping financial data (e.g. stock prices) from point A to point B and doing some mangling along the way. We have dozens of different data products that clients subscribe to. Now, this mangling/data processing system produces log reports. Sometimes, it produces just two or three a day (depending on the data product in question), sometimes it produces hundreds of logs per day.

Here's the thing - Data Integrity needs to go through these log reports. Now, DI people are nice and relatively bright about the data - and have tons of domain-specific knowledge - but most of them find the idea of ssh'ing to a production system and running "less" on the log reports a strange and alien idea. Therefore it has been decreed that there shall be built a web-accessible system for DI to use to look at logs with.

An aspect of this system - the web-based log access mechanism - is what I'm seeking design advice for.

Some further details: this new system is not (at least initially) for the whole of our product line, but just a few products that we're just now starting up using a new architecture.

We already have a module that summarizes each job's log into a report for DI that is usually about 10-20 lines long, but occasionally can be over 100 lines (depending on how much stuff went wrong during processing). This module is fairly fast, but still takes as long as a second or two on some of the larger log files. Also, occasionally DI will need to be able to view the raw log file, such as when they need to email a problem to the operations staff or the developers. So the report is good, but not always enough, depending on what went wrong.

Here's basically what I'm thinking:

  • We'll have two cgi scripts on each of the machines doing these data processing jobs:
    • one will take a product name and produce as output a listing of job names, something like what an ls -lt on the logs directory would produce, but htmlized with links to the other script
    • The other script will take a product name, a job name, and a "type" parameter and, depending on the type parameter, will display the log report, display the raw log, or initiate a download of the raw log.
  • We'll have one central web page that sets up a frameset with a list of data products in a column at the left; each of those data products, when selected, will change the right-hand side of the frameset to the current list of logs for that product.

So far, that would describe a system something like the DI setup for other existing products, except for the on-the-fly report generation. (Traditionally, reports are batch-generated in the middle of the night) One thing I'm thinking of adding, though, is an RSS or Atom syndication feed of new reports for each of the products - I've found an RSS reader a wonderful tool for wasting time, and it occurrs to me that it might be possible to harness it for good too.

So this gets tricky - what do I put in the syndication feed? Just the job name? The log report? How do I keep from overloading the system with requests to regenerate the feed - some kind of cache directory? What about how far back the feeds should go? And what about format? RSS vs. Atom? I'm inclined more towards Atom, because of the multiple timestamps (and Atom just seems saner to me).

Most of the data products in this new setup will generate less than five logs per day, at reasonably fixed time intervals. One will generate hundreds, throughout the day, at unpredictable intervals.

--
@/=map{[/./g]}qw/.h_nJ Xapou cets krht ele_ r_ra/; map{y/X_/\n /;print}map{pop@$_}@/for@/