|Welcome to the Monastery|
Re: Splitting squid log lines with perlby fsn (Friar)
|on Sep 16, 2002 at 10:41 UTC||Need Help??|
I have little belief that different environments represent variable names differently, ie. @line_elements on the command line and @cache in vim. I also don't think emacs will rewrite statements and remove paranthesis, so I guess you are just giving examples rather than cutting-and-pasting , no?
However, the mystery with the vanishing micro symbol is easy to explain, and has alot to do with history and old terminals. The micro symbol has an ASCII value of more than 127. Historically, shells would only present ASCII values from 32 to 127 (0-31 has special meanings, like LineFeed and stuff) because old terminals only used 7 bits when communicating over a serial line, so the 8th bit was stripped anyway. This was also dependant on how you opened the serial device, for example. Now, people like me, who stuff their national characters, like åäö, in high places over 127 didn't like this and made new terminals with support for full 8 bits.
But 8 bits are not enough for representing all national characters around the world, so there are mechanisms for telling the shell what character mapping to use, and also which characters to print and not to print. Apparently, the shells you and I are using are configured not to print the micro symbol and therefore swallows it.
Emacs, being what it is, is rather picky about these issues, much in the same way as the shell itself. So, it too swallows "unprintable" characters, but it seems to replace them with a '?' instead of just leaving them out.
Vi(m) on the other is much more lenient in these issues, you can even load a binary file, patch the strings in it (as long as you keep the file size and don't go over the old strings bounds), save it and expect it to run. And, in most fonts, there is a graphic representation for each and every charcode, even the "unprintable" ones. So vi(m) happily sends each and every charcode(at least charcodes over 32) to the terminal, without the filtering of the shell.
Conclusion: different environments filters or replaces some charcodes over 127, due to historical and/or internationalization issues. By changing LOCALE settings and stuff, you could probably force the shell to print the micro symbol also. But then, you are deep in the localisation tar pits of hell.
Now, finally, on to your real problem. It's been a while since I looked at squid logs, but I seem to remember that it actually used some strange field separator, possibly the micro symbol. In that case, the split is actually meant to split lines on that symbol. But as I said, I never really hacked the log files myself.