You should define if you care to not skip any files in the case new files arrive while your program fails to run. If you just monitor the directory for changes and your program goes down for some reason, any files added will be ignored.
You should also think about what happens if a file for some reason gets to be processed twice. Would it be a bigger problem than some wasted resources? If yes, maybe you should think about having the parsing program keeping track of processed files and just use rsync to fetch them in your working directory. Or write a script to run after rsyncing the files in your working directory.
Here is a version of Discipulus' code below, not copying files but just keeping track and calling the xml processing script, expected to be called from a cron job:
## pseudo code:
my %cache_of_already_read_files;
my @xml;
%cache_of_already_read_files = &load_cache_from_somewhere;
if (not defined %cache_of_already_read_files) { # Load failed.
# Do some assumptions here to have a starting point, for example:
@xml = &get_xml_files_names_based_on_timestamp;
# ... or just assume that this is the first run:
#@xml = &get_xml_files_names;
} else {
@xml = &get_xml_files_names;
};
foreach my $filename (@xml) {
next if exists $cache_of_already_read_files{$filename};
$cache_of_already_read_files{$filename} = 'found at'.scalar (local
+time(time));
&process_xml_file($filename);
}
&clean_cache_from_older_filenames(\%cache_of_already_read_files, \@xml
+);
&save_cache_somewhere(\%cache_of_already_read_files);
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|