Many thanks for all the interest and help here. All the replies were useful.
I really liked these constructs:
s/\".*\"//
my @F = split(/\s+/);
and
my @F = /(\" .*? \" | \S+)/gx
But I found the quickest solution was this:
while (<>){
$front = substr( $_, 0, index($_, '"' )-1, "");
$back = substr( $_, rindex( $_, '"' )+2);
$user_agent = substr ($_, 1, rindex( $_, '"' ));
$front=~/^([^#\s]+\s\S+)\s(\S+)\s(\S+)\s(\S+)\s(\S+)\s(\S+)\s(\S+)
+\s(\S+)\s(\S+)\s(\S+)\s(\S+)\s(\S+)\s(\S+)\s(\S+)\s(\S+)$/;
print <<END_FORMAT;
date time = $1
time taken = $2
c-ip = $3
sc-status = $4
ns-action = $5
sc-bytes = $6
cs-bytes = $7
cs-method = $8
cs-uri-scheme = $9
cs-host = $10
cs-uri-stem = $11
cs-username = $12
s-hierarchy = $13
s-supplier-name = $14
cs(Content-Type) = $15
cs(User-Agent) = $user_agent
END_FORMAT
$back=~/^(\S+)\s(\S+)\s(\S+)\s(\S+)\s(\S+)\s*$/;
print <<END_FORMAT2
sc-filter-result = $1
sc-filter-category = $2
x-virus-id = $3
s-ip = $4
s-sitename = $5
END_FORMAT2
}
This processed the following gzip'd log:
bash-2.05b$ ls -l SG*
-rwxr-xr-x 1 js js 106236830 Mar 7 17:02 SG_CSGL02_mai
+n_470302220000.log.gz
in 1 minute 32 sec
bash-2.05b$ time gzip -dc SG* | ./test.pl >/dev/null
6.96user 0.63system 1:32.95elapsed 8%CPU (0avgtext+0avgdata 0maxreside
+nt)k
0inputs+0outputs (94major+33minor)pagefaults 0swaps
on a 2.6Ghz AMD processor (500MB).