Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Parsing Log Files

by Anonymous Monk
on Apr 15, 2004 at 10:27 UTC ( #345343=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,
I ask this question before, but I guess I wasn't clear about what I wanted to ask.
All started when I needed to parse a log file that looks like that:
Company Name*345467*YW34567c*activitype*04/15/2004*11:34:10*123456789* +1 Company Name Other*345467*YW34567c*activitype*04/15/2004*11:34:10*1234 +56789*3

Course many more lines will be there.
I am tring to parse the log files here and running into problems where I can't display the name of the element I am looping trough, but I can print the number of times it shows on the file. Here is the code, let me know please, where the problem is.
foreach (<LOGFILE>){ if (/^(.*?)*(.*?)*(.*?)*(.*?)*(.*?)*(.*?)*(.*?)*(.*?)$/gi){ #prin +t starting at the date push (@logg,$1); } } my %count; my $total_count; foreach my $element( @logg ) { #4 ++$count{$element}; $total_count++; } #Sort hash by its values foreach my $element (sort {$count{$a} <=> $count{$b}} keys %count){ print "Name: $element<font color=red size=\"2\">Shows:<b>$count +{$element}</b></font> times.<br>"; }

I want to do this to every element on the file, I mean:
$1*$2*...*$8. And display how many times each element was present on the file.
If I run this code it will show the number of times ok, but I can't display such:
Company was here for 8 times.
I hope I was clear now.
And, Thanks for the help, once again!

Replies are listed 'Best First'.
Re: Regular Exp. Problem!
by allolex (Curate) on Apr 15, 2004 at 11:11 UTC

    If I read your problem right, this code should work. I tested it on the data shown here and the output is what you see.

    ladoix% cat 345343.pl #!/usr/bin/perl use strict; use warnings; my @trans; # Read all your data into an array of arrays while (<DATA>) { my @line = split /\*/; # Using split instead of regex push @trans, \@line; } # Go through each of the column names I made up # and fun the function countcolumn() on it foreach ( qw( name code type date time bla1 bla2 ) ) { countcolumn(\@trans,$_); } # countcolumn takes an array reference and a column name as arguments sub countcolumn { my $arrayref = shift; my $name = shift; # A lookup hash to simplify accessing the array # indices in the rest of the code my %lookup = ( name => 0, code => 1, type => 2, date => 3, time => 4, bla1 => 5, bla2 => 6, ); my %counthash; # Give the value found under $name in %lookup a # name reflecting what it is my $index = $lookup{$name}; # Do the counting for the column foreach my $trans ( @$arrayref ) { $counthash{$trans->[$index]}++; } print "Column: $name\n"; # Print a sorted list print map { "$_ : $counthash{$_}\n" } sort keys %counthash; print "\n"; } __DATA__ Company Name*345467*YW34567c*activitype*04/15/2004*11:34:10*123456789* +1 Company Name Other*345468*YW34567c*activitype*04/15/2004*11:34:10*1234 +56789*3 Company Name 1*345469*YW34567c*activitype*04/15/2004*11:34:10*12345678 +9*1 Company Name 3*345468*YW34567c*activitype*04/15/2004*11:34:10*12345678 +9*1 Company Name 2*345467*YW34567c*activitype*04/15/2004*11:34:10*12345678 +9*1 Company Name 4*345469*YW34567c*activitype*04/15/2004*11:34:10*12345678 +9*1 Company Name 2*345467*YW34567c*activitype*04/15/2004*11:34:10*12345678 +9*1 Company Name 4*345467*YW34567c*activitype*04/16/2004*07:34:00*12345678 +9*1 Company Name 1*345468*YW34567c*activitype*04/16/2004*09:30:00*12345678 +9*1 Company Name 1*345469*YW34567c*activitype*04/16/2004*10:34:00*12345678 +9*1 Company Name 2*345467*YW34567c*activitype*04/16/2004*11:37:00*12345678 +9*1 #output ladoix% perl 345343.pl Column: name Company Name : 1 Company Name 1 : 3 Company Name 2 : 3 Company Name 3 : 1 Company Name 4 : 2 Company Name Other : 1 Column: code 345467 : 5 345468 : 3 345469 : 3 Column: type YW34567c : 11 Column: date activitype : 11 Column: time 04/15/2004 : 7 04/16/2004 : 4 Column: bla1 07:34:00 : 1 09:30:00 : 1 10:34:00 : 1 11:34:10 : 7 11:37:00 : 1 Column: bla2 123456789 : 11

    --
    Damon Allen Davison

      Thank you all very much, my first problem was not seeing the
      (.*?)*(.*?) problem, when it should be (.*?)\*(.*?), thanks again, sometimes you just can see.
      And I really liked the approach from Damon Allen Davison as well, it seems that you have a lot of patience, have you ever thought about teaching perl before?
      Thanks again!

        Alas, the people in the Perl teaching business are a lot better than I am. I do teach, though---just not Perl. ;) In nay case, best of luck to you!

        --
        Damon Allen Davison

Re: Regular Exp. Problem!
by matija (Priest) on Apr 15, 2004 at 10:48 UTC
    You need to escape (replace * with \*) all the stars that are used as separators. Otherwise, the regexp engine will take them to mean "the previous element can repeat 0 or more times".

    Something like this might be closer:

    /^(.*?)\*(.*?)\*(.*?)\*(.*?)\*(.*?)\*(.*?)\*(.*?)\*(.*?)$/

    As for the second part of the question: You are using essentialy the correct method, but instead of pushing $1 into an array, and then iterating over that array, you can just assign to the has right at the point where you do the push now.

Re: Regular Exp. Problem!
by ishnid (Monk) on Apr 15, 2004 at 10:51 UTC

    Don't forget that the * is a special character in regexps. To match a literal *, you have to put a backslash before it.

    Better yet, given the format of your lines, you could just use split() on each line:

    my @bits = split(/\*/); push(@logg, $bits[0]);

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://345343]
Approved by allolex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (5)
As of 2020-10-29 05:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My favourite web site is:












    Results (269 votes). Check out past polls.

    Notices?