In other news, in my real-live Apache 2.4 webserver using the default-format log, I see lines like this: ...
Then you've got a problem. (Hint: That would break Apache::Log::Parser amongst most other 'proper parsers')
| [reply] |
if so, that would be remarkably b0rken behavior for Apache::Log::Parser -- oddly enough, a quick glance of the source reveals that it indeed knows how to handle double-quoted fields; it's not doing a simple split. I'll bet an entire pitcher of beer that it handles this just fine.
408, BTW, is Request Timeout; connection is opened but no actual command ever gets sent before the server closes it down. So yeah, perhaps a problem there, but not with the server.
| [reply] |
q{([^\s]*)\s+([^\s]*)\s+([^\s]*)\s+\[(([^: ]+):([^ ]+) ([-+0-9]+))\]\s
++"(([^\s]+) ([^\s]+)( ([^\s"]*))?)"\s+([^\s]*)\s+([^\s]*)};
Won't match "-", because it expects and requires at least two space delimited fields within the quotes; and allows for a third.
Note also that both ID fields are expected to match [^\s]* (I guess he's not aware of \S; and it should at least be + not *; which could be an indication of his perl experience.).
So, a "proper parser" would break. Maybe it has a back-up plan for if the regex fails; but equally, it's simple to code a back up plan for the white space split also.
So let's review:
- The OP posted asked about using pack & unpack, and a couple of early responders posted, with positive sounding confirmations.
- I countered by informing him that pack & unpack were completely inappropriate for the task; and suggested split as a starting point in his "personal learning experience".
- You pop up and rather than trying to help the op; you attempt to pick holes in my post; despite that its purpose was to save the OP wasting time with pack & unpack.
- So, I reminded you: "He did ask for a learning exercise; not a pre-solved solution.".
- So you come back with this guess: "(or if Apache really does go to some pains to make sure spaces never show up in the various log fields -- say by always representing them as + or %20 -- then yay, but I'm not sure this is actually true.)".
Which is demonstrably wrong!
- You retort with: "which says nothing about logname and user,".
Look at the regex above! Wrong again.
- And "nor does it guarantee that the HTTP command field always consists of exactly 3 space-separated components ".
Also wrong!
- So then you throw "10.54.33.35 - - [18/Jun/2015:09:05:55 -0700] "-" 408 0" into the mix.
And, as I've shown above, that would (without special handling) break most pre-solved solutions; which I'll remind you: the OP explicitly didn't want.
And which could just as easily be handled by a special case with the split version.
You know, as a part of the personal learning experience!
A big part of which might be that having tried it for himself; he'd decides to opt for a pre-solved solution.
Or he might decide to write his own CPAN module that does it better than any of the existing ones.
That's his choice.
All I did was short circuit his learning, by informing him that pack & unpack were definitely the wrong tools to start with.
So, here we are 13 levels deep; and you've become boring. No attempt to help the OP; just banging on about stuff it seems you barely understand.
So, I'm bored and done. T'was fun.
Update: I forgot this little gem. You offered this wishy-washy suggestion "or using Text::CSV or somesuch"; but then later suggest that split will break because "which says nothing about logname and user,"; completely oblivious to the fact that if either ID contained spaces; it would break that module also!
| [reply] [d/l] [select] |