http://qs321.pair.com?node_id=1130904


in reply to Re^5: Question about the most efficient way to read Apache log files without All-In-One Modules from CPAN (personal learning exercise)
in thread Question about the most efficient way to read Apache log files without All-In-One Modules from CPAN (personal learning exercise)

which is a little bit odd, since I don't recall attempting to provide a pre-solved solution myself, but whatever.

If the point is to provide a learning exercise in how split can bite you in the ass, then fine.

(or if Apache really does go to some pains to make sure spaces never show up in the various log fields -- say by always representing them as + or %20 -- then yay, but I'm not sure this is actually true.)

  • Comment on Re^6: Question about the most efficient way to read Apache log files without All-In-One Modules from CPAN (personal learning exercise)
  • Select or Download Code

Replies are listed 'Best First'.
Re^7: Question about the most efficient way to read Apache log files without All-In-One Modules from CPAN (personal learning exercise)
by BrowserUk (Patriarch) on Jun 17, 2015 at 23:32 UTC
    then yay, but I'm not sure this is actually true

    Read the spec.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I'm with torvalds on this
    In the absence of evidence, opinion is indistinguishable from prejudice. Agile (and TDD) debunked
      • which says nothing about logname and user,
      • nor does it guarantee that the HTTP command field always consists of exactly 3 space-separated components (hint: it doesn't).
        "...3 space-separated components (hint: it doesn't)"

        I used this: <LogFormat "%h %l %u %t \"%r\" %>s %b" common>, as in the example by kcott.

        From the docs:

        "First, the method used by the client is GET. Second, the client requested the resource /apache_pb.gif, and third, the client used the protocol HTTP/1.0."

        Hence the request field will always look like this: "GET /karls.beer HTTP/1.0".

        Best regards, Karl

        «The Crux of the Biscuit is the Apostrophe»

        nor does it guarantee that the HTTP command field always consists of exactly 3 space-separated components (hint: it doesn't).

        You read the wrong spec, or you misread the right one:

        The Request-Line begins: -- with a method token, -- followed by the Request-URI -- and the protocol version, -- and ending with CRLF. The elements are separated by SP characters. No CR or LF is allowed except in the final CRLF sequence. Request-Line = Method SP Request-URI SP HTTP-Version CRLF

        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!
        div class=