Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re^4: ignore list of files using readdir function

by kaka_2 (Sexton)
on Jul 23, 2013 at 08:50 UTC ( [id://1045799]=note: print w/replies, xml ) Need Help??


in reply to Re^3: ignore list of files using readdir function
in thread ignore list of files using readdir function

You are right, it takes almost 15 minute which is too much

below is complete code.
#! /usr/bin/perl use strict; use Math::BigFloat; Math::BigFloat->precision(0); sub GetINDirFiles { my ($path) = @_; opendir DIR, $path or die $!; my @files = readdir DIR; my @files = grep {!/\_ACK_/} readdir DIR; closedir DIR; return(@files); } sub GetOUTDirFiles { my ($path) = @_; opendir DIR, $path or die $!; my @files = readdir DIR; my @files = grep {/\_ACK.xml$/} readdir DIR; closedir DIR; return(@files); } # Main my $inpath = "/AAA/BBB/CCC/IN"; my $outpath = "/AAA/BBB/CCC/OUT"; my $outsuffix = "_ACK.xml"; my $insuffix = ".xml"; # Added by me my $timethreshold = 900; # set time threshold in seconds (900 se +conds equal 15 minutes) my @delindex; my @infiles = &GetINDirFiles($inpath); my @outfiles = &GetOUTDirFiles($outpath); my $index = 0; # index used to get string position in array foreach my $infile (@infiles) { $infile =~ s/(.*)$insuffix/$1/g; # remove suffix to do co +mparation # Added by me foreach my $outfile (@outfiles) { $outfile =~ s/(.*)$outsuffix/$1/g; # remove suffix t +o do comparation if ($outfile eq $infile){ push (@delindex, $index); # get list of st +rings to be removed from array } } $index += 1; } delete @infiles[@delindex]; # remove strings my $currenttime = time; # get current time from system (epoch t +ime) foreach my $file (@infiles) { next unless (-f "$inpath/$file$insuffix"); # ignore directo +ries # INSERT SUFFIX AGAIN ($insuffix) my $mtime = (stat "$inpath/$file$insuffix" )[9]; # get mt +ime from file (epoch time) # INSERT SUFFIX AGAIN ($insuffix) my $diff = ($currenttime - $mtime); if ($diff > $timethreshold) { print "\n - file " . $file . $insuffix . " in " . $inpa +th . " directory was created at more than " . Math::BigFloat->new($d +iff / 60) . " minutes."; # INSERT SUFFIX AGAIN ($insuffix) # PUT THE ACTION THAT YOU WANT DO HERE!!! } }

i need to do it on regular interval like 5 minute or 15 minute, using an tool i use. so it really does not make sense if i check the files which i have already checked and i would not mind if this completes in minute or less then this but 15 minute is really too much

kindly assist

-KAKA-

Replies are listed 'Best First'.
Re^5: ignore list of files using readdir function
by hdb (Monsignor) on Jul 23, 2013 at 09:07 UTC

    Your code can be optimized a lot. Here is some proposal but I cannot test it as I do not have your directories at hand. I added comments to explain what I am doing so I hope it helps:

    my @infiles = &GetINDirFiles($inpath); # instead of having an array with the outfiles use a hash for faster l +ookup # also remove suffix at this stage already, no need to do it again and + again in the loop # you need to escape your suffix variable in \Q...\E for special chara +cters such as the dot # only remove the suffix at the end, no need for (.*) my %outfiles = map { s/\Q$outsuffix\E$//; $_ => 1 } &GetOUTDirFiles($o +utpath); my $index = 0; # index used to get string position in array foreach my $infile (@infiles) { # see above re the replacement $infile =~ s/\Q$insuffix\E$//; # remove suffix to do comp +aration # Added by me # instead of loop through array of outfiles do hash lookup push (@delindex, $index) if exists $outfiles{$infile}; $index += 1; }

    UPDATE: Forget my code above. You can write this as:

    my %outfiles = map { /(.*)\Q$outsuffix\E$/; $1 => 1 } &GetOUTDirFiles( +$outpath); my @infiles = grep { /(.*)\Q$insuffix\E$/; not exists $outfiles{$1} } +&GetINDirFiles($inpath); print "@infiles\n";

    and it should be fast.

    UPDATE 2: Here is the full story.

    use strict; use warnings; sub GetINDirFiles { my ($path) = @_; opendir my $dir, $path or die $!; return grep {!/\_ACK_/} readdir $dir; } sub GetOUTDirFiles { my ($path) = @_; opendir my $dir, $path or die $!; return grep {/\_ACK.xml$/} readdir $dir; } # Main my $inpath = "./IN"; my $outpath = "./OUT"; my $outsuffix = "_ACK.xml"; my $insuffix = ".xml"; my $timethreshold = 900; # set time threshold in seconds (900 se +conds equal 15 minutes) my %outfiles = map { /(.*)\Q$outsuffix\E$/; $1 => 1 } &GetOUTDirFiles( +$outpath); my @infiles = grep { /(.*)\Q$insuffix\E$/; $1 and not exists $outfiles +{$1} } &GetINDirFiles($inpath); my $currenttime = time; # get current time from system (epoch t +ime) @infiles = grep { -f "$inpath/$_" and ( $currenttime - (stat "$inpath/ +$_" )[9] ) > $timethreshold } @infiles; # now you have all input files w/o corresponding output file that are +older than 15 minutes for (@infiles) { print "File $_ in $inpath directory was created ". ( $currentt +ime - (stat "$inpath/$_" )[9] )/60.0 ."minutes ago.\n"; # put your action here }
Re^5: ignore list of files using readdir function
by mtmcc (Hermit) on Jul 23, 2013 at 10:30 UTC
    Do you want to compare the names of files or the contents of files? Could you give an example of what data you're trying to compare, and what output you expect to get?

      My requirement is to check if a new file comes into IN folder, with a maximum delay of 15 minute same file with _ACK.xml is present in the OUT Directory or not?

      for example a01.xml comes in the folder IN, this will be processed by the application and sent it to OUT folder after processing (maximum time of processing is 15 minute) as a01_ACK.xml.

      Content is not important in this case. in windows i can use WMI to to check if new file is created in IN directory (instance created) and then check for the same in OUT folder but in UNIX i can not get such trigger so i had to choose the way of comparing files but i am not much into UNIX so cant think other than this.

      -KAKA-

        Does this do what you want?:

        #! /usr/bin/perl use strict; use Math::BigFloat; Math::BigFloat->precision(0); sub GetINDirFiles { my @files; my ($path) = $_[0]; opendir (my $directory, $path) or die $!; while(readdir $directory) { push (@files, $_) unless (($_ =~ m/_ACK_/) || ($_ =~ m +/^\./)); } closedir $directory; return(@files); } sub GetOUTDirFiles { my @files; my ($path) = $_[0]; opendir (my $directory, $path) or die $!; while(readdir $directory) { unless ($_ =~ m/^\./) { push (@files, $_) if $_ =~ m/_ACK.xml/; } } closedir $directory; return(@files); } # Main my @range; my $inpath = "./IN/"; my $outpath = "./INACK/"; my $outsuffix = "_ACK.xml"; my $insuffix = ".xml"; # Added by me my $timethreshold = 900; # set time threshold in seconds (900 se +conds equal 15 minutes) my @delindex; my @infiles = &GetINDirFiles($inpath); my @outfiles = &GetOUTDirFiles($outpath); my $index = 0; # index used to get string position in array my $count = @infiles; for (my $x = 0; $x < $count; $x+= 1) { my $name = $infiles[$x]; $name =~ s/$insuffix//; for (@outfiles) { if ($_ =~ m/$name/) { push (@range, $x) } } } my $rangeCount = @range; for (my $x = 0; $x < $count; $x+= 1) { splice(@infiles, $range[$x], 1); } # remove strings my $currenttime = time; # get current time from system (epoch time) foreach my $file (@infiles) { my $fileForTime = "$inpath"."$file"; my $mtime = ( stat $fileForTime )[9]; my $diff = ($currenttime - $mtime); # ignore directories # INSERT SUFFIX AGAIN ($insuffix) # get mtime from file (epoch time) # INSERT SUFFIX AGAI +N ($insuffix) if ($diff > $timethreshold) { print "\n - file " . $file . " in " . $inpath . " direc +tory was created at more than " . Math::BigFloat->new($diff / 60) . +" minutes."; # INSERT SUFFIX AGAIN ($insuffix) # PUT THE ACTION THAT YOU WANT DO HERE!!! } }

        NOTE: Change the pathnames!
        In that case a simpler more direct approach might work for you
        #!perl use strict; my $inpath = "/AAA/BBB/CCC/IN"; my $outpath = "/AAA/BBB/CCC/OUT"; # check inpath files less than 60 mins # and more than 15 mins old my $min_age = 0.25/24; # 15 mins in days my $max_age = 1.00/24; # 1 hour in days my $count =0; my $ignore=0; print "Scanning in dir $inpath .. \n"; opendir DIR, $inpath or die $!; for (readdir DIR){ # skip if not a file next unless (-f "$inpath/$_"); # filter on age my $age = -M "$inpath/$_"; if ( ($age > $min_age) && ($age < $max_age) && ( !/_ACK_/ ) ){ # build outfile name (my $outfile = $_) =~ s/(\.xml)$/_ACK$1/; # check outfile exists if (-e "$outpath/$outfile"){ print "$_ => $outfile exists\n"; } else { print "$_ => $outfile does not exist\n"; } ++$count; } else { ++$ignore; } } closedir DIR; print "$count files checked - $ignore files ignored\n";
        poj

        but in UNIX i can not get such trigger so i had to choose the way of comparing files but i am not much into UNIX so cant think other than this.

        Real "UNIX" ? If not you can use Linux::Inotify2

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1045799]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (4)
As of 2024-04-24 13:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found