Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Parsing Log Files

by Anonymous Monk
on Apr 15, 2004 at 10:27 UTC ( #345343=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,
I ask this question before, but I guess I wasn't clear about what I wanted to ask.
All started when I needed to parse a log file that looks like that:
Company Name*345467*YW34567c*activitype*04/15/2004*11:34:10*123456789* +1 Company Name Other*345467*YW34567c*activitype*04/15/2004*11:34:10*1234 +56789*3

Course many more lines will be there.
I am tring to parse the log files here and running into problems where I can't display the name of the element I am looping trough, but I can print the number of times it shows on the file. Here is the code, let me know please, where the problem is.
foreach (<LOGFILE>){ if (/^(.*?)*(.*?)*(.*?)*(.*?)*(.*?)*(.*?)*(.*?)*(.*?)$/gi){ #prin +t starting at the date push (@logg,$1); } } my %count; my $total_count; foreach my $element( @logg ) { #4 ++$count{$element}; $total_count++; } #Sort hash by its values foreach my $element (sort {$count{$a} <=> $count{$b}} keys %count){ print "Name: $element<font color=red size=\"2\">Shows:<b>$count +{$element}</b></font> times.<br>"; }

I want to do this to every element on the file, I mean:
$1*$2*...*$8. And display how many times each element was present on the file.
If I run this code it will show the number of times ok, but I can't display such:
Company was here for 8 times.
I hope I was clear now.
And, Thanks for the help, once again!

Replies are listed 'Best First'.
Re: Regular Exp. Problem!
by allolex (Curate) on Apr 15, 2004 at 11:11 UTC

    If I read your problem right, this code should work. I tested it on the data shown here and the output is what you see.

    ladoix% cat #!/usr/bin/perl use strict; use warnings; my @trans; # Read all your data into an array of arrays while (<DATA>) { my @line = split /\*/; # Using split instead of regex push @trans, \@line; } # Go through each of the column names I made up # and fun the function countcolumn() on it foreach ( qw( name code type date time bla1 bla2 ) ) { countcolumn(\@trans,$_); } # countcolumn takes an array reference and a column name as arguments sub countcolumn { my $arrayref = shift; my $name = shift; # A lookup hash to simplify accessing the array # indices in the rest of the code my %lookup = ( name => 0, code => 1, type => 2, date => 3, time => 4, bla1 => 5, bla2 => 6, ); my %counthash; # Give the value found under $name in %lookup a # name reflecting what it is my $index = $lookup{$name}; # Do the counting for the column foreach my $trans ( @$arrayref ) { $counthash{$trans->[$index]}++; } print "Column: $name\n"; # Print a sorted list print map { "$_ : $counthash{$_}\n" } sort keys %counthash; print "\n"; } __DATA__ Company Name*345467*YW34567c*activitype*04/15/2004*11:34:10*123456789* +1 Company Name Other*345468*YW34567c*activitype*04/15/2004*11:34:10*1234 +56789*3 Company Name 1*345469*YW34567c*activitype*04/15/2004*11:34:10*12345678 +9*1 Company Name 3*345468*YW34567c*activitype*04/15/2004*11:34:10*12345678 +9*1 Company Name 2*345467*YW34567c*activitype*04/15/2004*11:34:10*12345678 +9*1 Company Name 4*345469*YW34567c*activitype*04/15/2004*11:34:10*12345678 +9*1 Company Name 2*345467*YW34567c*activitype*04/15/2004*11:34:10*12345678 +9*1 Company Name 4*345467*YW34567c*activitype*04/16/2004*07:34:00*12345678 +9*1 Company Name 1*345468*YW34567c*activitype*04/16/2004*09:30:00*12345678 +9*1 Company Name 1*345469*YW34567c*activitype*04/16/2004*10:34:00*12345678 +9*1 Company Name 2*345467*YW34567c*activitype*04/16/2004*11:37:00*12345678 +9*1 #output ladoix% perl Column: name Company Name : 1 Company Name 1 : 3 Company Name 2 : 3 Company Name 3 : 1 Company Name 4 : 2 Company Name Other : 1 Column: code 345467 : 5 345468 : 3 345469 : 3 Column: type YW34567c : 11 Column: date activitype : 11 Column: time 04/15/2004 : 7 04/16/2004 : 4 Column: bla1 07:34:00 : 1 09:30:00 : 1 10:34:00 : 1 11:34:10 : 7 11:37:00 : 1 Column: bla2 123456789 : 11

    Damon Allen Davison

      Thank you all very much, my first problem was not seeing the
      (.*?)*(.*?) problem, when it should be (.*?)\*(.*?), thanks again, sometimes you just can see.
      And I really liked the approach from Damon Allen Davison as well, it seems that you have a lot of patience, have you ever thought about teaching perl before?
      Thanks again!

        Alas, the people in the Perl teaching business are a lot better than I am. I do teach, though---just not Perl. ;) In nay case, best of luck to you!

        Damon Allen Davison

Re: Regular Exp. Problem!
by matija (Priest) on Apr 15, 2004 at 10:48 UTC
    You need to escape (replace * with \*) all the stars that are used as separators. Otherwise, the regexp engine will take them to mean "the previous element can repeat 0 or more times".

    Something like this might be closer:


    As for the second part of the question: You are using essentialy the correct method, but instead of pushing $1 into an array, and then iterating over that array, you can just assign to the has right at the point where you do the push now.

Re: Regular Exp. Problem!
by ishnid (Monk) on Apr 15, 2004 at 10:51 UTC

    Don't forget that the * is a special character in regexps. To match a literal *, you have to put a backslash before it.

    Better yet, given the format of your lines, you could just use split() on each line:

    my @bits = split(/\*/); push(@logg, $bits[0]);

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://345343]
Approved by allolex
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (2)
As of 2020-07-05 01:15 GMT
Find Nodes?
    Voting Booth?

    No recent polls found