Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Re^10: Question about the most efficient way to read Apache log files without All-In-One Modules from CPAN (personal learning exercise)

by wrog (Friar)
on Jun 18, 2015 at 21:39 UTC ( [id://1131075]=note: print w/replies, xml ) Need Help??


in reply to Re^9: Question about the most efficient way to read Apache log files without All-In-One Modules from CPAN (personal learning exercise)
in thread Question about the most efficient way to read Apache log files without All-In-One Modules from CPAN (personal learning exercise)

  1. That's the HTTP spec, which is all very nice but is not the Apache Log spec.
  2. Just assuming for the sake of argument that the HTTP command line is indeed being copied verbatim into that field, there's also the question of whether all of the clients out there will be actually following the spec — we live in a world with script kiddies and DDOS hobbyists, after all (hint: I'm guessing there's a reason the Apache folks saw fit to double-quote that field)

In other news, in my real-live Apache 2.4 webserver using the default-format log, I see lines like this:

10.54.33.35 - - [18/Jun/2015:09:05:55 -0700] "-" 408 0

Replies are listed 'Best First'.
Re^11: Question about the most efficient way to read Apache log files without All-In-One Modules from CPAN (personal learning exercise)
by BrowserUk (Patriarch) on Jun 18, 2015 at 21:53 UTC
    In other news, in my real-live Apache 2.4 webserver using the default-format log, I see lines like this: ...

    Then you've got a problem. (Hint: That would break Apache::Log::Parser amongst most other 'proper parsers')


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
    I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!
      if so, that would be remarkably b0rken behavior for Apache::Log::Parser -- oddly enough, a quick glance of the source reveals that it indeed knows how to handle double-quoted fields; it's not doing a simple split. I'll bet an entire pitcher of beer that it handles this just fine.

      408, BTW, is Request Timeout; connection is opened but no actual command ever gets sent before the server closes it down. So yeah, perhaps a problem there, but not with the server.

        quick glance of the source

        You didnt look close enough.

        This regex:

        q{([^\s]*)\s+([^\s]*)\s+([^\s]*)\s+\[(([^: ]+):([^ ]+) ([-+0-9]+))\]\s ++"(([^\s]+) ([^\s]+)( ([^\s"]*))?)"\s+([^\s]*)\s+([^\s]*)};

        Won't match "-", because it expects and requires at least two space delimited fields within the quotes; and allows for a third.

        Note also that both ID fields are expected to match [^\s]* (I guess he's not aware of \S; and it should at least be + not *; which could be an indication of his perl experience.).

        So, a "proper parser" would break. Maybe it has a back-up plan for if the regex fails; but equally, it's simple to code a back up plan for the white space split also.

        So let's review:

        1. The OP posted asked about using pack & unpack, and a couple of early responders posted, with positive sounding confirmations.
        2. I countered by informing him that pack & unpack were completely inappropriate for the task; and suggested split as a starting point in his "personal learning experience".
        3. You pop up and rather than trying to help the op; you attempt to pick holes in my post; despite that its purpose was to save the OP wasting time with pack & unpack.
        4. So, I reminded you: "He did ask for a learning exercise; not a pre-solved solution.".
        5. So you come back with this guess: "(or if Apache really does go to some pains to make sure spaces never show up in the various log fields -- say by always representing them as + or %20 -- then yay, but I'm not sure this is actually true.)".

          Which is demonstrably wrong!

        6. You retort with: "which says nothing about logname and user,".

          Look at the regex above! Wrong again.

        7. And "nor does it guarantee that the HTTP command field always consists of exactly 3 space-separated components ".

          Also wrong!

        8. So then you throw "10.54.33.35 - - [18/Jun/2015:09:05:55 -0700] "-" 408 0" into the mix.

          And, as I've shown above, that would (without special handling) break most pre-solved solutions; which I'll remind you: the OP explicitly didn't want.

          And which could just as easily be handled by a special case with the split version.

          You know, as a part of the personal learning experience!

          A big part of which might be that having tried it for himself; he'd decides to opt for a pre-solved solution.

          Or he might decide to write his own CPAN module that does it better than any of the existing ones.

          That's his choice.

          All I did was short circuit his learning, by informing him that pack & unpack were definitely the wrong tools to start with.

        So, here we are 13 levels deep; and you've become boring. No attempt to help the OP; just banging on about stuff it seems you barely understand.

        So, I'm bored and done. T'was fun.

        Update: I forgot this little gem. You offered this wishy-washy suggestion "or using Text::CSV or somesuch"; but then later suggest that split will break because "which says nothing about logname and user,"; completely oblivious to the fact that if either ID contained spaces; it would break that module also!


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1131075]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (3)
As of 2024-03-29 14:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found