Re: File Search And Compare
by particle (Vicar) on Feb 18, 2002 at 21:46 UTC
|
here's some well-behaved code that should do what you want. notice use strict;, -w, and error checking. notice the use of a hash (anytime you think unique, think hash.) also note, close filehandles when you're done with them.
by the way, i've tested this, and it works for me.
oh, and you should use the reply link to the right of the post, to make sure the author you're replying to sees the response.
#!/usr/bin/perl -w
use strict;
$|++;
use FileHandle;
my $FILE = new FileHandle;
# three-argument open, with error handling
open($FILE,"<","/var/log/everything")
or die "ERROR: can't open file! $!";
# create variables:
# $pattern - pattern for regular expression
# %parsed - hash, keys are lines containing pattern,
# values are counter of times seen
# $i - counter of total lines matching pattern
my $pattern = 'fwa';
my (%parsed, $i);
while(<$FILE>){
{ # read line from filehandle, assign to special variable $_
if( /$pattern/i && $i++ )
# search $_ for 'fwa' (case-insensitive)
# and increment counter ($i) if found
{
chomp $_; # remove newline
$parsed{ $_ }++; # use line as hash key, increment times seen
}
}
close($FILE);
# print output: total entities, and sorted number of each
print "\n $i entities\n";
print "$_ x $parsed{$_}\n" for sort keys %parsed;
~Particle
| [reply] [d/l] [select] |
Re: File Search And Compare
by impossiblerobot (Deacon) on Feb 18, 2002 at 21:26 UTC
|
I think this node by Ovid should help; it's an answer to pretty much the same question.
I found it using Super Search.
Update: PyroX, I had hoped you would use Ovid's code as example of how to do what you were trying to do (not just plug his code into yours without understanding why it worked).
Unfortunately, it looks like you got confused by his use of Perl's build in DATA filehandle (which is often used in demonstration versions of programs on this site).
The sample code looks like this:
# open LOG, "< $log" or die "Can't open $log: $!";
while (<DATA>){
push (@data, $_) if $_ =~ /$ip/;
}
# close LOG;
Ovid has commented out the lines that open the external file to be read, and is instead reading from the __DATA__ section that appears at the bottom of the same file. To make this work with an external file you would uncomment those lines, and change the filehandle name in the input operator(<>), as follows:
open LOG, "< $log" or die "Can't open $log: $!"; # Uncommented
while (<LOG>){ # Changed filehandle name
push (@data, $_) if $_ =~ /$ip/;
}
close LOG; # Uncommented
I hope this makes things more clear. :-)
Impossible Robot | [reply] [d/l] [select] |
Re: File Search And Compare
by Kozz (Friar) on Feb 18, 2002 at 21:31 UTC
|
PyroX:
I think that what you're really looking for is a hash-based solution. If you were to tie() a hash using the DB_File module (if indeed this is a real MONSTER of a file), this would use the disk rather than memory (correct me if I'm wrong, most wise monks!).
Perhaps something like
open(FILE, "< /var/log/everything") or die "Could not read file: $!";
while($input=<FILE>){
if($input =~ /fwa/i){
$tied_hash_ref->{ lc($input) }++;
}
}
Notice that I've used lc() to lower-case the text in the line. Otherwise the hash would contain separate values for "fwa100" vs "FWA100" vs "fWa100". Remove this if you desire to keep them separate.
You could then iterate over this tied hash, printing the key/value pairs. Though to be honest, I've not had a great need for DB_File much, and would welcome other monks to contribute usage examples. ;) | [reply] [d/l] |
Re: File Search And Compare
by PyroX (Pilgrim) on Feb 18, 2002 at 21:42 UTC
|
One Item I have changed, which worked a tiny bit better, but is still too craptastic to ever be used:
#!/usr/bin/perl
system("clear");
open(FILE,"/var/log/everything");
my $i=0;
while($input=<FILE>){
if($input=~/fwa/i){
$i++; @parsed[$i]=$input;
}
}
print "\n $i entities";
foreach my $test (@parsed){
my $data="";
my $x="";
my $t="";
foreach my $final (@parsed){
if($final eq $test){
$x++;
my $data='valid_time';
}
$t++;
}
if($x>1){
print "$x $test";
}
}
I changed the output control, everything was returning because everything existed at least once (itself).
| [reply] [d/l] |
Re: File Search And Compare
by zengargoyle (Deacon) on Feb 19, 2002 at 06:51 UTC
|
If you're lucky your timestamps are fixed width and you can use substr.
my $ts_fmt = 'MMM DD HH:MM:SS ';
my $line = 'Feb 18 00:12:14 foo bar bat baz fwa';
my $time = substr($line, 0, length $ts_fmt, '');
chop $time; # pesky space..
# $time is the time part.
# $line holds the part after the time is removed.
If you're searching for fixed string, index might be faster than regex.
my $match = 'fwa';
if ( -1 != index($line,$match)) {
# matched!
$seen_lines{$line}++;
}
| [reply] [d/l] [select] |
Re: File Search And Compare
by PyroX (Pilgrim) on Feb 18, 2002 at 22:18 UTC
|
OK!
But now I have a new problem, I think that may work, but there is a timestamp in each line, so I need so split before you process the line in the file. The split needs to be a regular expression split, of ':01-60 ' so that is will be split with ':' and any number 01 - 60 followed by a space, together:
':34 ' or ':21 ' or ':57 ' would all work, this is the seconds in the timestamp of course. that should leave us with an array with [0] (the trash) and 1 (the goodies)
I tried inserting somehting like:
# create variables:
# $pattern - pattern for regular expression
# %parsed - hash, keys are lines containing pattern,
# values are counter of times seen
# $i - counter of total lines matching pattern
my $pattern = 'fwa';
my (%parsed, $i);
while(<$FILE>){
# read line from filehandle, assign to special variable $_
if(/$pattern/i && $i++)
{
chomp $_; # remove newline
@new=split(/:[01-60] /,$_);
$_=$new[1];
$parsed{ $_ }++; # use line as hash
ect ect ect.......
But that didn't work, and I am unsure of both the regex, and ties with your code. Any more help would be much appriciated.
| [reply] [d/l] |
|
you don't want to use split like that, it won't do what you want. can you include at least one line of input data? it's rather hard to debug this sort of error without it. you should probably use a regular expression, but i can't say without sample data.
~Particle
| [reply] |
Re: File Search And Compare
by PyroX (Pilgrim) on Feb 18, 2002 at 21:37 UTC
|
Kozz:
I am interested in more info on your idea, will look, but know anything more?
The file is a huge 220,000 lines per day, so by the end of the week it would be gargantuan. I think I am going to do a daily rotate though.
Keep em coming guys. | [reply] |
Re: File Search And Compare
by PyroX (Pilgrim) on Feb 19, 2002 at 18:14 UTC
|
Thanks Everyone,
here is the final product, which seems to be working very well so far.
#!/usr/bin/perl -w
use strict;
$|++;
use FileHandle;
my $FILE = new FileHandle;
open($FILE,"<","/var/log/$ARGV[0]") or die "ERROR: can't open file! $!
+";
my $pattern = 'fw1';
my (%parsed, $i);
while(<$FILE>){
if(/$pattern/i && $i++){
my $z=0;
my $out="";
chomp $_;
@new=split(/[0-9][0-9]:[0-9][0-9]:[0-9][0-9]/,$_);
foreach $piece (@new){
$z++;
chomp $piece;
$out.="$piece";
}
@new=split(/service/,$out);
$out=$new[0];
print "$out\n";
}
}
close($FILE);
Thanks Again! | [reply] [d/l] |
|
Except one small oops:   $i++   returns 0 on the first post-increment, so the   if (/pat/ && ...   fails the first time a line matches.
update My oops! That seems to be what you want, or something like it.   Your original seemed to look for duplicate _full_ lines.   But nevermind...
  p
| [reply] [d/l] [select] |
Re: File Search And Compare
by PyroX (Pilgrim) on Feb 22, 2002 at 02:42 UTC
|
Yea, I should make note of that change, this gets lines with the text, and I pipe it to 'uniq -cid' to tell me the count of similar lines. | [reply] |
Re: File Search And Compare
by PyroX (Pilgrim) on Feb 18, 2002 at 21:33 UTC
|
Name "main::DATA" used only once: possible typo at ./pix2 line 14.
readline() on closed filehandle main::DATA at ./pix2 line 14.
| [reply] [d/l] |