Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??
Okay, How can I create an array of filehandles? was a lot of help, since it shows how I can keep a bunch of files open at once (combined with a hash using filenames as keys, very cool).

Too bad it's not what I need. Let's back up, shall we?

I have a log file. From UDP port 161 (SNMP traps) to snmptrapd to syslog-ng and into a file. File looks roughly like this:

Sep 28 19:45:10 logsrvr snmptrapd[<pid>]: <ip1>: Trap msga. Sep 28 19:45:10 logsrvr snmptrapd[<pid>]: <ip3>: Trap msgg. Sep 28 19:45:10 logsrvr snmptrapd[<pid>]: <ip4>: Trap msgd. Sep 28 19:45:10 logsrvr snmptrapd[<pid>]: <ip1>: Trap msge. Sep 28 19:45:10 logsrvr snmptrapd[<pid>]: <ip2>: Trap msga.

I have seven input files, some gzipped, some not. Since they're log files, I can use (stat "$filename")[9] to get the last modified time. Sort those to keep the log entries in order without having to mess with the timestamps in the log. Match /\]:\s(\S+):/ to get the IP address of the original trap sender.

Sounds easy, right? Here's the hard part: For each trap sender, I want to write an HTML file with only the traps for that sender. If there were only a few senders, I could just open the file, write the HTML 'top', add <pre>, then put the filehandle into a hash, and just write to the appropriate filehandle as the lines are parsed.

The problem is that there can be hundreds of original senders. Having that many filehandles open is certain to be problematic. The input data is about 100MB, so I'd rather not parse the data more than once if I can get away without it (although I wouldn't mind going through them twice if a first pass would generate some useful meta-information).

SO... What's a good way to deal with this? As it is, I may be faced with just opening the correct output file based on the sender IP, perhaps writing the HTML 'top', writing a line, closing it, and on to the next line. All that opening and closing files seems bad somehow, so I'm seeking the wisdom of the Monastery.

A second possibility - if they won't be used often - is to pull a list of IPs from the log files and dynamically write CGI scripts as the links instead of HTML files. The CGIs, when accessed, would `zcat logs.gz | grep <ip>`, basically generating the list of traps for a given IP at runtime. Quick to make, slow (and expensive) to use very often.

So what do you think? Easy way out of this? Should I just risk opening a zillion filehandles? Should I just open them and close them one at a time? Suggestions are welcome.


In reply to Parse data into large number of output files. by Rhys

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others studying the Monastery: (3)
    As of 2020-11-30 06:10 GMT
    Find Nodes?
      Voting Booth?

      No recent polls found