Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re^2: ignore list of files using readdir function

by kaka_2 (Sexton)
on Jul 23, 2013 at 08:00 UTC ( [id://1045792]=note: print w/replies, xml ) Need Help??


in reply to Re: ignore list of files using readdir function
in thread ignore list of files using readdir function

Hello There,

I was able to use the grep in order to ignore the files with pattern

sub GetINDirFiles { my ($path) = @_; opendir DIR, $path or die $!; my @files = readdir DIR; my @files = grep {!/\_ACK_/} readdir DIR; closedir DIR; return(@files); }

now the problem i have is that there are around 5000 files and later when i compare these files agains another list of files, it takes too much time. So i was thinking is it possible to list files which are between current time to current time - 15 minute?

Thank You. -KAKA-

Replies are listed 'Best First'.
Re^3: ignore list of files using readdir function
by mtmcc (Hermit) on Jul 23, 2013 at 08:31 UTC
    It shouldn't take 15 minutes to compare lists of filenames. What exactly are you trying to do?

      You are right, it takes almost 15 minute which is too much

      below is complete code.
      #! /usr/bin/perl use strict; use Math::BigFloat; Math::BigFloat->precision(0); sub GetINDirFiles { my ($path) = @_; opendir DIR, $path or die $!; my @files = readdir DIR; my @files = grep {!/\_ACK_/} readdir DIR; closedir DIR; return(@files); } sub GetOUTDirFiles { my ($path) = @_; opendir DIR, $path or die $!; my @files = readdir DIR; my @files = grep {/\_ACK.xml$/} readdir DIR; closedir DIR; return(@files); } # Main my $inpath = "/AAA/BBB/CCC/IN"; my $outpath = "/AAA/BBB/CCC/OUT"; my $outsuffix = "_ACK.xml"; my $insuffix = ".xml"; # Added by me my $timethreshold = 900; # set time threshold in seconds (900 se +conds equal 15 minutes) my @delindex; my @infiles = &GetINDirFiles($inpath); my @outfiles = &GetOUTDirFiles($outpath); my $index = 0; # index used to get string position in array foreach my $infile (@infiles) { $infile =~ s/(.*)$insuffix/$1/g; # remove suffix to do co +mparation # Added by me foreach my $outfile (@outfiles) { $outfile =~ s/(.*)$outsuffix/$1/g; # remove suffix t +o do comparation if ($outfile eq $infile){ push (@delindex, $index); # get list of st +rings to be removed from array } } $index += 1; } delete @infiles[@delindex]; # remove strings my $currenttime = time; # get current time from system (epoch t +ime) foreach my $file (@infiles) { next unless (-f "$inpath/$file$insuffix"); # ignore directo +ries # INSERT SUFFIX AGAIN ($insuffix) my $mtime = (stat "$inpath/$file$insuffix" )[9]; # get mt +ime from file (epoch time) # INSERT SUFFIX AGAIN ($insuffix) my $diff = ($currenttime - $mtime); if ($diff > $timethreshold) { print "\n - file " . $file . $insuffix . " in " . $inpa +th . " directory was created at more than " . Math::BigFloat->new($d +iff / 60) . " minutes."; # INSERT SUFFIX AGAIN ($insuffix) # PUT THE ACTION THAT YOU WANT DO HERE!!! } }

      i need to do it on regular interval like 5 minute or 15 minute, using an tool i use. so it really does not make sense if i check the files which i have already checked and i would not mind if this completes in minute or less then this but 15 minute is really too much

      kindly assist

      -KAKA-

        Your code can be optimized a lot. Here is some proposal but I cannot test it as I do not have your directories at hand. I added comments to explain what I am doing so I hope it helps:

        my @infiles = &GetINDirFiles($inpath); # instead of having an array with the outfiles use a hash for faster l +ookup # also remove suffix at this stage already, no need to do it again and + again in the loop # you need to escape your suffix variable in \Q...\E for special chara +cters such as the dot # only remove the suffix at the end, no need for (.*) my %outfiles = map { s/\Q$outsuffix\E$//; $_ => 1 } &GetOUTDirFiles($o +utpath); my $index = 0; # index used to get string position in array foreach my $infile (@infiles) { # see above re the replacement $infile =~ s/\Q$insuffix\E$//; # remove suffix to do comp +aration # Added by me # instead of loop through array of outfiles do hash lookup push (@delindex, $index) if exists $outfiles{$infile}; $index += 1; }

        UPDATE: Forget my code above. You can write this as:

        my %outfiles = map { /(.*)\Q$outsuffix\E$/; $1 => 1 } &GetOUTDirFiles( +$outpath); my @infiles = grep { /(.*)\Q$insuffix\E$/; not exists $outfiles{$1} } +&GetINDirFiles($inpath); print "@infiles\n";

        and it should be fast.

        UPDATE 2: Here is the full story.

        use strict; use warnings; sub GetINDirFiles { my ($path) = @_; opendir my $dir, $path or die $!; return grep {!/\_ACK_/} readdir $dir; } sub GetOUTDirFiles { my ($path) = @_; opendir my $dir, $path or die $!; return grep {/\_ACK.xml$/} readdir $dir; } # Main my $inpath = "./IN"; my $outpath = "./OUT"; my $outsuffix = "_ACK.xml"; my $insuffix = ".xml"; my $timethreshold = 900; # set time threshold in seconds (900 se +conds equal 15 minutes) my %outfiles = map { /(.*)\Q$outsuffix\E$/; $1 => 1 } &GetOUTDirFiles( +$outpath); my @infiles = grep { /(.*)\Q$insuffix\E$/; $1 and not exists $outfiles +{$1} } &GetINDirFiles($inpath); my $currenttime = time; # get current time from system (epoch t +ime) @infiles = grep { -f "$inpath/$_" and ( $currenttime - (stat "$inpath/ +$_" )[9] ) > $timethreshold } @infiles; # now you have all input files w/o corresponding output file that are +older than 15 minutes for (@infiles) { print "File $_ in $inpath directory was created ". ( $currentt +ime - (stat "$inpath/$_" )[9] )/60.0 ."minutes ago.\n"; # put your action here }
        Do you want to compare the names of files or the contents of files? Could you give an example of what data you're trying to compare, and what output you expect to get?
Re^3: ignore list of files using readdir function
by soonix (Canon) on Jul 23, 2013 at 08:45 UTC
    The first
    my @files = readdir DIR;
    in your code is superfluous. You are doing the work (at least the reading of the directory) twice. Might not account for the whole excess time, but maybe...

      Sorry. but that is commented, i dont know where the # gone

      .  # my @files = readdir DIR;

      -KAKA-

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1045792]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (4)
As of 2024-04-24 18:29 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found