Piping many individual files into a single perl script

kelder has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Piping many individual files into a single perl script by BrowserUk (Patriarch) on Sep 28, 2008 at 01:23 UTC
If you're running on win32, the ikegami's shell solution won't work for you. However, you can achieve a similar affect by adding: `@ARGV = map glob, @ARGV;` at the top of your program. A nice way to process a list of files is: `#! perl -slw use strict; BEGIN{ @ARGV = map glob, @ARGV; } while( <> ) { ## here, $_ is every line of every file that matches the ## wildcarded paths supplied in the command line if( /the search string/ ) { ## Do something } }` [download] So `c:>theScriptAbove.pl .c .h` would read all C files and C header files in the current directory, and search each line of all of those files for `"the search string"` Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l] [select]
Re^2: Piping many individual files into a single perl script by kelder (Novice) on Sep 28, 2008 at 03:46 UTC
So I tried the code you provided,and while it did pipe all the files in, in listed them all in one line in the results file. A sample of what I received: File# A B C 1 5 6 3 What I wanted: File# A B C 1 2 1 0 2 2 4 2 3 1 1 1 This is my code: #!C:\Perl use strict; BEGIN{ @ARGV=map glob, @ARGV; } open(RES, ">>results.txt"); print RES "File Number A A% B B% Null Null%\n"; my $A=0; #these three lines set my initial counts at zero my $B=0; my $null=0; my $filenum=0; while (<>){ chomp($_); if ($_ eq "stringa"){ $A++;} elsif ($_ eq "stringb"){ $B++;} else { $null++; } } my $popa=$A/1000; #these lines determine what percent of the populatio +n the strings represent my $popa=sprintf('%.2f',$popa); #cut the percentages to two decimal pl +aces my $popb=$B/1000; my $popb=sprintf('%.2f',$popb); my $popnull=$null/1000; my $popnull=sprintf('%.2f',$popnull); my $filenum++; #Add one to my filenumber print RES "$filenum $A $popa $B $popb $null $pop +null\n"; #print the results out to the "results" file [download] What am I doing wrong? Edit: Thanks for the help so far!	[reply] [d/l]
Re^3: Piping many individual files into a single perl script by BrowserUk (Patriarch) on Sep 28, 2008 at 14:27 UTC
You need to detect the end of each individual file, print your results for that file and reset the counts. See the explanation of `eof(ARGV)` in perlfunc: #!C:\Perl use strict; BEGIN{ @ARGV=map glob, @ARGV; } open(RES, ">>results.txt"); print RES "File Number A A% B B% Null Null%\n"; my $A = 0; #these three lines set my initial counts at zero my $B = 0; my $null = 0; my $filenum = 0; while( <> ){ chomp($_); if ($_ eq "stringa"){ $A++;} elsif ($_ eq "stringb"){ $B++;} else { $null++; } if( eof( ARGV ) ) { ## true after the end of each individual file my $popa = sprintf( '%.2f', $A / 1000 ); my $popb = sprintf( '%.2f', $B / 1000 ); my $popnull = sprintf( '%.2f', $null / 1000 ); my $filenum++; #Add one to my filenumber print RES "$filenum $A $popa $B $popb $null + $popnull\n"; $A = $B = $null = 0; ## Reset counts for the next file } } [download] As I mentioned above, if OS/X is a *nix-like system, you probably don't need the `@ARGV = map glob, @ARGV` as the shell will take care of that for you. (Though it probably won't do any harm.) Also, in your code you have several place where you do: `... my $var = ....; my $var = sprintf ... $var; ...` [download] If you are running with strict and warnings, you should be getting messages of the form: `"my" variable $var masks earlier declaration in same scope at ....`...don't ignore them, they are there for a purpose. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l] [select]
Re^4: Piping many individual files into a single perl script by broomduster (Priest) on Sep 28, 2008 at 14:47 UTC
Re^5: Piping many individual files into a single perl script by BrowserUk (Patriarch) on Sep 28, 2008 at 15:30 UTC
Re^3: Piping many individual files into a single perl script by graff (Chancellor) on Sep 28, 2008 at 15:22 UTC
I'm a little confused. You said you are running on macosx, but your code starts with: `#!C:\Perl` [download] That makes no sense, and it entails that you can only run the script with a command line like this: `perl path/file_name_of_script arg1 ...` [download] (where the "path/" part is only needed if the script is not in your shell's current working directory). I would use this as the initial "shebang" line: `#!/usr/bin/perl` [download] because macosx is really a unix OS, and in unix, perl is typically found in the /usr/bin/ directory; macosx definitely has perl in that location. With that as the shebang line, and doing the shell command "chmod +x file_name_of_script", the script becomes available as a shell command: `path/file_name_of_script arg1 ...` [download] where the "path/" part is only needed if your shell PATH variable does not include the directory where the script is stored. As for your question about iterating over a list of file names, a method that I find useful goes like this: the perl script expects as input a list of file names, loads those into an array, and then iterates over the array. At each iteration, if there's a problem with the file or its contents, issue a warning and skip on to the next file in the list; e.g.: #!/usr/bin/perl use strict; use Getopt::Long; my $Usage = "Usage: $0 [-p path] filename.list\n or: ls [path] \| $0 +[-p path]\n"; my $path = '.'; die $Usage unless ( GetOptions( 'p=s' => \$path ) and -d $path ); die $Usage if (( @ARGV and !-f $ARGV[0] ) or ( @ARGV==0 and -t )); # need file name args or pipeline input my @file_list = <>; # read all input as a list of file names chomp @file_list; # get rid of line-feeds for my $name ( @file_list ) { my $file = "$path/$name"; if ( ! -f $file ) { warn "input value '$file' does not seem to be a data file; ski +pped\n"; next; } if ( ! open( I, "<", $file )) { warn "open failed for input file '$file'; skipped\n"; next; } ... } [download] There are already very good shell command tools for creating a list of file names ("ls", "find"), and for filtering lists ("grep"), so I'm inclined not to rewrite those things in a perl script that is supposed to process a list of file names. The exception to that rule is when the script is really intended for a specific task that always involves a specific location and/or filter for getting its list of file names to work on, because in that case, I'd rather not have to repeat the selection process on the command line every time I run the script.	[reply] [d/l] [select]
Re^3: Piping many individual files into a single perl script by apl (Monsignor) on Sep 28, 2008 at 11:52 UTC
$A and $B are the running totals for all of the files. You either need to make them arrays (indexed by file), or you need to print the totals when you reach the end of a file (after which, you would reset the variables to zero).	[reply]
Re^3: Piping many individual files into a single perl script by blazar (Canon) on Sep 29, 2008 at 14:45 UTC
I like BrowserUk's solution below, except that I'd probably rewrite it (I mostly didn't like the chained `if`-`elsif`'s) in a manner similar to (untested:) `#/usr/bin/perl use strict; use warnings; use 5.010; BEGIN{ @ARGV=map glob, @ARGV } print "File Number A A% B B% Null Null%"; my $default = ''; # set to something sensible, the empty string seems + good. my @allowed = (qw/stringa stringb/, $default); my (%count, $filenum); while(<>) { chomp; $count{$_ ~~ @allowed ? $_ : $default}++; if (eof) { $filenum++; say "$filenum ", join ' ' => map { my $x=$count{$_}; $x, sprintf('%.2f', $x/1000) } @allo +wed; @count{@allowed}=(0) x @allowed; } } __END__` [download] I threw in some 5.10-isms in the course of doing so, but it wouldn't be terribly different with pre-5.10 exists. `--` ~~If you can't understand the incipit, then please check the IPB Campaign.~~	[reply] [d/l] [select]
Re^2: Piping many individual files into a single perl script by kelder (Novice) on Sep 28, 2008 at 01:37 UTC
Actually, I'm running this script on a Mac running OS/X. Is the script still the same?	[reply]
Re^3: Piping many individual files into a single perl script by BrowserUk (Patriarch) on Sep 28, 2008 at 01:43 UTC
I know nothing of OS/X, but I think that it is a *nix variant, in which case you should read ikegami's posts and ignore mine. Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply]
Re: Piping many individual files into a single perl script by ikegami (Patriarch) on Sep 28, 2008 at 01:13 UTC
As an alternative to iterating over `@ARGV` and opening each file yourself, you could use a combination of `<>` and $ARGV: `my %total; my %by_file; while (<>) { chomp; $total{$_}++; $by_file{$ARGV}{$_}++; }` [download] A simple output routine: `my @strings = sort keys %total; for my $string ( @strings ) { print( "\t$string" ); } print( "\n" ); for my $file ( keys %counts ) { print( $file ); for my $string ( @strings ) { print( "\t$by_file{$file}{$string}" ); } print( "\n" ); }` [download] The file names are passed to the script in same fashion as in my first post.	[reply] [d/l] [select]
Re: Piping many individual files into a single perl script by ikegami (Patriarch) on Sep 28, 2008 at 01:05 UTC
I'm at a loss as to how to pipe each of the input files into the program. `program.pl file` [download] which is the same as `program.pl file1 file2 ...` [download] Then get the names of the files from @ARGV Update*: Oops, not quite the same. On Windows, you'll need to do `use File::Glob qw( bsd_glob ); @ARGV = map bsd_glob($_), @ARGV;` [download]	[reply] [d/l] [select]
Re^2: Piping many individual files into a single perl script by blazar (Canon) on Sep 29, 2008 at 14:01 UTC
I personally believe that as a very minor side note, since the OP mentions something like "10000 files," some shells do have problem with a large amount of files. Except that I can't remember how large is large. But I have seen error messages for something like "command line too long" occasionally. In that case, adopting the same technique as for Windows, would be a cure... `use File::Glob qw( bsd_glob ); @ARGV = map bsd_glob($_), @ARGV;` [download] This sounds very wrong [Update: but it is right, see my reply to ikegami's comment] unless one has a good reason to do so: since we're on Windows, we most probably want DOS/Windows-like globbing and glob is dwimmy enough to select its own correct "incarnation:" in all of my scripts that may want globbing, written for Windows or "ported" (what a big word!) there from NIX, I include the same code as BrowserUk's. Sometimes, (depending on how "important" the app will be...) I also provide "standard" `-i` and `-o` cli switches for input and output, since shell redirection has some very little but not null deficiencies. `--` If you can't understand the incipit, then please check the IPB Campaign*.	[reply] [d/l] [select]
Re^3: Piping many individual files into a single perl script by ikegami (Patriarch) on Sep 29, 2008 at 14:07 UTC
I include the same code as BrowserUk's But that uses bsd_glob as well. And worse yet, a version that breaks when a space is present.	[reply]
Re^4: Piping many individual files into a single perl script by blazar (Canon) on Sep 30, 2008 at 10:14 UTC
Re^5: Piping many individual files into a single perl script by ikegami (Patriarch) on Sep 30, 2008 at 15:47 UTC
Some notes below your chosen depth have not been shown here
Re^3: Piping many individual files into a single perl script by Anonymous Monk on Sep 29, 2008 at 14:11 UTC
Its not exactly files, its characters :) for example, WinXP its 8191, Win2k/NT4 its 2047	[reply]


We don't bite newbies here... much
	PerlMonks