How to process multiple input files?

rnaeye has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: How to process multiple input files? by jwkrahn (Abbot) on May 22, 2011 at 20:42 UTC
You need to reset `$count` for each file. Something like this (UNTESTED): `#!/usr/bin/perl use strict; use warnings; $^I = ".bak"; undef $/; my $count = 0; while ( my $line = <> ) { $line =~ s{ (<\/div>) } { ++$count == 2 ? "\t<?php include(\$_SERVER['DOCUMENT_ROOT'].\"\/includes +\/footer.php\"); ?>\n\n$1" : $1 }gex; print $line; $count = 0; }` [download]	[reply] [d/l] [select]
Re^2: How to process multiple input files? by John M. Dlugosz (Monsignor) on May 23, 2011 at 00:22 UTC
I worry that an empty file will stop it prematurely. Or a file might contain just "0" or somesuch, but that's less likely. Since he's slurping whole files rather than reading lines, I think it would be prudent to test for defined. (Hmm, what does the normal line-oriented read do if an empty file is in the list? Maybe it's always an issue.) update: never mind. In production code I would have simply written `defined` to be sure, but looking through the docs I see that this construct is special even in the case of explicit assignment. I know that the quick `while(<>)` tests for defined, or started to at some specific version of Perl (I remember the classic Camel book explaining how lines are never False because they end in "\n"), but wasn't sure that applied when assignment was being made. In general, I rely less on special cases and magical meanings in well-written production code than in a quick one-liner. Declaring variables, and not using $_ much falls into the same category, so I somehow was thinking the magic was not in effect.	[reply] [d/l] [select]
Re^3: How to process multiple input files? by jwkrahn (Abbot) on May 23, 2011 at 06:44 UTC
I think it would be prudent to test for defined. The code I posted: `while ( my $line = <> ) {` [download] does test for defined.	[reply] [d/l]
Re^4: How to process multiple input files? by John M. Dlugosz (Monsignor) on May 23, 2011 at 17:22 UTC
Re^5: How to process multiple input files? by Anonymous Monk on May 23, 2011 at 18:48 UTC
Some notes below your chosen depth have not been shown here
Re^2: How to process multiple input files? by rnaeye (Friar) on May 22, 2011 at 21:50 UTC
thanks for so much. works great.	[reply]
Re: How to process multiple input files? by graff (Chancellor) on May 22, 2011 at 20:51 UTC
I have tried to put the script within a foreach loop, but it did not work. So, I'm guessing that you didn't try it this way: `#!/usr/bin/perl use strict; use warnings; for my $f ( @ARGV ) { local $/; open( I, '<', $f ); open( O, '>', "$f.bak" ); my $count = 0; my $line = <I>; $line =~ s{ (<\/div>) } { if (++$count == 2){ "\t<?php include(\$_SERVER['DOCUMENT_ROOT'].\"\/incl +udes\/footer.php\"); ?>\n\n".$1; } else { $1; } }gex; print O $line; }` [download] That works for me. (BTW, I'm compulsive about making the indentation look right -- seems silly, but it's really helpful to keep code less illegible.) If you have so many files that you can't fit them all as args on a command line, there's the unix "xargs" tool: `ls \| xargs your_prog ## or use "find ... \| xargs your_prog"` [download]	[reply] [d/l] [select]
Re: How to process multiple input files? by John M. Dlugosz (Monsignor) on May 22, 2011 at 19:30 UTC
As written, the `<>` construct will read from each file name given on the command line, in turn. You don't need to do anything else; just list more than one file on the command line.	[reply] [d/l]
Re^2: How to process multiple input files? by rnaeye (Friar) on May 22, 2011 at 20:08 UTC
It only processes first file in the command line.	[reply]
Re^3: How to process multiple input files? by John M. Dlugosz (Monsignor) on May 23, 2011 at 00:15 UTC
Ah, you are only reading once. I see what you were asking now. Making a while loop out of it like so: `my $line; while (defined ($line = <>)) {` [download] will repeat until there are no files left.	[reply] [d/l]
Re: How to process multiple input files? by jaredor (Priest) on May 22, 2011 at 20:49 UTC
Try using the while construct with the <> operator. Something like `while (my $line = <>) { ... }` [download] Oops, after submission I saw jwkrahn responded in more detail. That comment should solve (both) your problems, which I now understand to be 1) looping over command line file names, and 2) Modifying the second line of each file. One thing you might do instead of maintaining your own counter would be to use the built-in line counter. The special $. line number variable ~~will be properly maintained from file to file.~~ (will not be properly maintained with the <> operator unless you take special steps as described in the link given. Thank you again jwkrahn.)	[reply] [d/l]
Re^2: How to process multiple input files? by John M. Dlugosz (Monsignor) on May 23, 2011 at 00:27 UTC
He'll always have a line-count of 1, since he's slurping the files. The counter variable is used to count how many times the replacement is triggered with the /g option, not the number of "lines" read (he only reads one "line" in the original!). Putting the declaration of $counter inside the loop should do the trick simply. A better solution might be to rewrite the regex to find the second occurrence of `</div>` rather than finding all of them and only substituting the second, and "inserting" the content directly rather than repeating the found stuff in the replacement.	[reply] [d/l]
Re^3: How to process multiple input files? by jwkrahn (Abbot) on May 23, 2011 at 06:51 UTC
He's always have a line-count of 1, since he's slurping the files. `$.` contains the current record count, and since each file is one record it will be incremented for each file and so will not always be 1. Unless of course you reset `$.` or close `ARGV` at the end of each file.	[reply] [d/l] [select]
Re^4: How to process multiple input files? by jaredor (Priest) on May 23, 2011 at 12:28 UTC
Re^3: How to process multiple input files? by jaredor (Priest) on May 23, 2011 at 05:08 UTC
Thanks for pointing out my errors. I simply did not read the code closely enough. All your responses in this thread were good. I learned something. Good work.	[reply]
Re: How to process multiple input files? by John M. Dlugosz (Monsignor) on May 23, 2011 at 00:42 UTC
Oh, also your technique to find the second occurrence of something and do something to it is a bit strange. You could use the search /g in a loop and have normal code rather than the inside of evaluated replacement. But, you can locate the second occurrence directly and not need that kind of code. You want to insert something just before the second `</div>`, right? Something like this (untested!): `my $replacement= '\t<?php include(\$_SERVER['DOCUMENT_ROOT'].\"\/inclu +des\/footer.php\"); ?>\n\n'; s{ </div> .? \K (?=</div>) } { $replacement } x;` [download] Note that you don't use /g so don't keep checking all the rest of the divs, and you don't use $1 or anything in the replacement but "insert" it without replacing any of the stuff used to find that spot. The \K means that what came before is just context and not included in what gets replaced. The (?=pattern) does the same for what follows. Nothing is "in" the region replaced. See also the use of lazy quantifiers. The whole program becomes: `#!/usr/bin/perl use strict; use warnings; $^I = ".bak"; # same as -i option undef $/; # slurp whole files! my $replacement= '\t<?php include(\$_SERVER['DOCUMENT_ROOT'].\"\/inclu +des\/footer.php\"); ?>\n\n'; my $filecontents; while (defined ($filecontents=<>)) { $filecontents =~ s { </div> .? \K (?=</div>) } { $replacement } x; print $filecontents; }` [download] I added comments and changed the name of the variable from $line because nobody else noticed that this is not a single line. As written, it was confusing and hard to read because of built-in assumptions people make about idioms and style.	[reply] [d/l] [select]
Re: How to process multiple input files? by Anonymous Monk on May 22, 2011 at 23:48 UTC
See Iterator::Diamond, it provides a safe and customizable replacement for the <> (Diamond) operator. See Dangerous diamonds!	[reply]


Think about Loose Coupling
	PerlMonks