bendir has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks!

I made a perl program that parses the contents of an xml file into something more readable. The xml files who's filename include a date/time-stamp are arriving to our server every 15mins where they end up in a particular directory. There, the xml files are kept for 2 days until another process deletes them automatically.

My goal is to automate the checking for new files arriving in this directory (every 15mins), copy these new files into my working dir and feed them into the parser program. Like I said, the parser program is ready, what I need now is some way to:
- read the xml directory
- check for new files and ignore the ones already processed earlier
- copy the new files into my working dir where I can feed them into the parser program

Now for the actual question:
I'm sure File::Monitor could help out with a big chunk of the job but the likelihood of the server admin allowing me to install any additional perl modules is low. So, do you know of any (easy) way to achieve this without the help of any modules or is pushing the admin to install additional modules (File::Monitor) my best option?



Replies are listed 'Best First'.
Re: Monitoring directory contents
by tobyink (Canon) on Apr 08, 2014 at 10:41 UTC

    File::Monitor is pure Perl (not XS) and has no non-core dependencies. So you should be able to download the distribution, and copy the ".pm" files into a directory you do have permission to write to. Then just:

    use lib "/path/to/that/directory"; use File::Monitor;
    use Moops; class Cow :rw { has name => (default => 'Ermintrude') }; say Cow->new->name
Re: Monitoring directory contents
by DrHyde (Prior) on Apr 08, 2014 at 10:40 UTC

    If the module (and all its dependencies) is pure perl then you could trivially bundle it with your application. CPANdeps has an option to tell you if stuff is pure-perl or not.

    Otherwise if the admins won't allow you to just install stuff, take it up with your manager. If they are sensible you will need to be able to justify installing the extra software, but I'm sure you can do that. Things like "it will save development time", "it's well tested and here are the CPAN testers reports".

    Remember, the sysadmins' job isn't to obstruct you (even if some of them can make it feel like it is), it's to provide and maintain the infrastructure that you and other parts of the business need.

Re: Monitoring directory contents
by Bloodnok (Vicar) on Apr 08, 2014 at 11:06 UTC
    Would in not be simpler to let rsync take the strain e.g. (neither tried nor tested) ...
    my @files = `rsync <rysnc_args_and_opts>`; system "<parser_name> @files";
    which I guess you could just as easily implement in shell (again, neither tried nor tested) ...
    for F in `rsync <rysnc_args_and_opts>`; do <parser_name> $F done
    A user level that continues to overstate my experience :-))
Re: Monitoring directory contents
by Discipulus (Abbot) on Apr 08, 2014 at 10:59 UTC
    it is not simpler a solution like:
    ## pseudo code: my %cache_of_already_read_files; my $sleep_between = 300; while (1){ my @xml = &get_xml_files_names; # opendir, system (ls).. foreach (@xml) { next if exists $cache_of_already_read_files{$_}; $cache_of_already_read_files{$_} = 'found at'.scalar (localti +me(time)); &my_copy_to_destination ($cache_of_already_read_files{$_} ); } sleep $sleep_between; }
    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: Monitoring directory contents
by Theodore (Friar) on Apr 08, 2014 at 12:12 UTC
    You should define if you care to not skip any files in the case new files arrive while your program fails to run. If you just monitor the directory for changes and your program goes down for some reason, any files added will be ignored.
Re: Monitoring directory contents
by zentara (Archbishop) on Apr 08, 2014 at 18:05 UTC