Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

testing files for valid content

by grashoper (Monk)
on Apr 23, 2009 at 16:39 UTC ( [id://759585]=perlquestion: print w/replies, xml ) Need Help??

grashoper has asked for the wisdom of the Perl Monks concerning the following question:

I have a series of text files I would like to open and verify they either contain a full record or not. records are comma delimited as follows example.. of 1 record. 4/22/2009 1:36:02 PM,29.281263,23.429273,4.420066,29.219497,2.847088,11.055522,0.159151,AAR my current code so far which reads in directories and list of files I also have a small script which shows a single files content with labels, I would like to modify this file so that it checks for a full record or not, if not it should skip the line and do the next or possibly trash the line.
<code> opendir(DIR, "c:\\mlx\\") || die "can't opendir: $!"; my @list = grep { $_ ne '.' && $_ ne '..'&& $_ ne 'copy.pl' } @list=readdir(DIR); foreach my $name (@list) { $path="c:\\mlx\\$_"; shift @list; print $name, "\n"; opendir(DIR, "$path\\$name"); my @files=grep {$_ ne '.' && $_ ne '..'} @files=readdir(DIR); foreach $file(@files){ print "filename is ", $file, " \n"; print "and I am in directory ", $name, " \n"; } } example of echoing contents of file.. open (FILE, '04-22-2009.txt'); while (<FILE>) { chomp; ($timeoftest, $login, $searchload, $searchresults, $searchdetails,$add +listing,$editlisting,$logout,$sitecode) = split(","); print "time of test: $timeoftest\n"; print "login: $login\n"; print "searchload: $searchload\n"; print "searchresults: $searchresults\n"; print "searchdetails: $searchdetails\n"; print "addlisting: $addlisting\n"; print "editlisting: $editlisting\n"; print "logout: $logout\n"; print "sitecode: $sitecode\n"; print "---------\n"; } close (FILE); exit;
I tried using the sample provided by lostjimmy but I don't totally understand what its doing now.. all it does is give me some file names and the following output here is the code now.. and my output is below it. I would like for it to go through all directories and open all the files testing each and reporting if one does not contain the proper number of fields.
#use strict; #use warnings; opendir(DIR, "c:\\mlx\\") || die "can't opendir: $!"; my @list = grep { $_ ne '.' && $_ ne '..'&& $_ ne 'copy.pl' } readdir( +DIR); foreach my $name (@list) { $path="c:\\mlx\\$_"; shift @list; print $name, "\n"; opendir(DIR, "$path\\$name"); my @files=grep {$_ ne '.' && $_ ne '..'}readdir(DIR); foreach $file(@files){ print "filename is ", $file, " \n"; print "and I am in directory ", $name, " \n"; chdir $name; print "I am in directory $name"; open (FILE, '$file')|| die "Cannot open file $file"; while (<FILE>) { chomp; ($timeoftest, $login, $searchload, $searchresults, $searchdetails,$add +listing,$editlisting,$logout,$sitecode) = split(","); print "time of test: $timeoftest\n"; print "login: $login\n"; print "searchload: $searchload\n"; print "searchresults: $searchresults\n"; print "searchdetails: $searchdetails\n"; print "addlisting: $addlisting\n"; print "editlisting: $editlisting\n"; print "logout: $logout\n"; print "sitecode: $sitecode\n"; print "---------\n"; } close (FILE); exit; } }
output..
C:\>perl 2.pl 03-27-2009.txt 04-02-2009.txt 04-07-2009.txt 04-09-2009.txt 04-13-2009.txt 04-15-2009.txt 04-17-2009.txt 04-21-2009.txt 04-23-2009.txt AAR filename is 04-16-2009.txt and I am in directory AAR Cannot open file 04-16-2009.txt at 2.pl line 22. I am in directory AAR

Replies are listed 'Best First'.
Re: testing files for valid content
by almut (Canon) on Apr 23, 2009 at 17:01 UTC
    my @files=grep {$_ ne '.' && $_ ne '..'} @files=readdir(DIR);

    Not directly related to your question, but I've seen this construct a couple of times now in code you posted, so I thought it's maybe worth commenting on... before it becomes a habit :)

    The 'normal' way to write this would be

    my @files=grep {$_ ne '.' && $_ ne '..'} readdir(DIR);

    No need to assign the list of files to an intermediate array — which is also a different (i.e. package) variable from the first lexical my @files, BTW.

    Has anyone ever suggested to try use strict?  ;)

      Good advice.

      And if files are all that are needed (no directories), why not filter out all directories, not just those 2 special ones?

      my @files = grep {-f "$dir/$_"} readdir(DIR);

      where $dir is the directory path which DIR points to.

Re: testing files for valid content
by Bloodnok (Vicar) on Apr 23, 2009 at 16:44 UTC
    Why not use Text::xSV ?

    Using it would have additional benefit of simplifying your code :D

    A user level that continues to overstate my experience :-))
      what would this do if a line did not contain a complete record? I am looking at text xSV but I am not sure how to use it. I find the usage examples very confusing for some of these modules. I read it I just don't understand what it means.
        It's behaviour is, AFAIR, configurable, but by default I think (and perldoc suggests that) it confesses.

        Thus you can either override the default error handling behaviour or possibly use eval (or more elegantly, but less acceptably, Error) to catch the error and process accordingly.

        A user level that continues to overstate my experience :-))
Re: testing files for valid content
by lostjimmy (Chaplain) on Apr 23, 2009 at 17:34 UTC
    I would like to modify this file so that it checks for a full record or not

    What denotes a full record? I'm assuming a record with nine columns. So all you have to do (unless you use a CSV module) is split the line and make sure it has nine values.

    while (<FILE>) { chomp; my @values = split /,/; next if @values != 9; my($timeoftest, $login, $searchload, $searchresults, $searchdetails, $addlisting, $editlisting, $logout, $sitecode) = @values; # print the stuff like before }
      So all you have to do (unless you use a CSV module) is split the line and make sure it has nine values.

      Be careful. That works until a column has embedded (presumably quoted or otherwise escaped) commas. If splitting on commas always worked, there'd be little need for a module.

Re: testing files for valid content
by graff (Chancellor) on Apr 24, 2009 at 03:19 UTC
    Here's a simple(-minded) way to check a file for the number of commas per line:
    perl -lne '$n=tr/,//;$h{$n}++;END{print "$h{$_} rows have $_ commas" f +or (keys %h)}' some_file.csv
    If that produces only one line of output for a given file, you know the file has the same number of commas in all rows (and it tells you how many commas per row).

    If some lines have more and others have fewer, you'll get a breakdown of the variance. It still could be a "valid" CSV file, if lines with extra commas happen to have quotes or escapes (meaning that you really need to use a parsing module like Text::xSV).

    Apart from that, even if the CSV data is simple (no quoted/escaped commas) and has the same number of commas on every line, you need to be careful with your use of split() -- this would be wrong:

    split(",")
    You should do it like this instead:
    split( /,/, $_, -1 );
    If you don't do that, split() will ignore "extra" commas at the end of a line -- e.g. this:
    @array = split( /,/, "field1,field2,,field4,field5,,," );
    will fill @array like this:
    ( 'field1', 'field2', undef, 'field4', 'field5' );
    Note how the trailing empty fields are truncated. Please read about split.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://759585]
Approved by Bloodnok
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (7)
As of 2024-04-23 17:47 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found