There is more than one way (and mine is not the best)

NovMonk has asked for the wisdom of the Perl Monks concerning the following question:

Greetings Esteemed Monks,

While I was mopping the chapel floor this morning after Matins (or was it Lauds?) I had an epiphany. Actually I have this one quite often-- it is the nagging suspicion that I am doing something simple very much the long way 'round. I have a laughably simple program which is making my job much easier, but I think it can and should be better.

I have a number of *.txt files in directory, and I am doing something to each of them in turn. I currently use a file called "qcode" that lists each of the files I need processed, without the *.txt extension, thus:

q2
q5
q7
q23

What I am wondering is, is there another way to get at this, if I know that all the *txt files in the directory are the ones I want to process? At the end of this process I get these same *txt files saved as *.axe files with my changes. Here's my code-- as always, comments are appreciated.

#!/usr/bin/perl
use strict;
use warnings;
my ($qnum,$code,$ctext,$ntext,$stext);
open (QLIST, "qcode") || die "Can't open qcode file: $!\n";
while (<QLIST>){   
         chomp $_;  
         $qnum="$_";
         open (QIN, "$qnum.txt") || die "Can't open $qnum.txt file: $!
+\n";
         open (QOUT, ">$qnum.tmp") || die "Can't open $qnum.tmp file: 
+$!\n";
         while (<QIN>){
              s/\222/'/g;  
              s/\226/-/g;  
              chomp ($_);  
              if (/^Col/i){
                  s/^col.//gi;
                  s/\(net\)//gi;
                  $ntext = $_;                         
                  print QOUT "net1$ntext (net);unl1\n";
                  }
              elsif (/^\(S/i){ 
                 (undef,$stext) = split(/\s/,$_,2);      
                 print QOUT "net2$stext (subnet);unl1\n";
                 } 
              elsif (/^[0-9]/)
                 ($code,$ctext) = split(/\s/,$_,2);           
                 print QOUT "n01$ctext;c=(&txt($code.ge.1)\n";
                 }                                            
              else {};
              }
           }
close QIN;
close QOUT;
open (QIN, "$qnum.tmp") || die "Can't open $qnum.tmp file: $!\n";  
open (QOUT, ">$qnum.axe") || die "Can't open $qnum.axe file: $!\n";
while (<QIN>){ 
     open (QIN, "$qnum.tmp") || die "Can't open $qnum.tmp file: $!\n";
+   
     open (QOUT, ">$qnum.axe") || die "Can't open $qnum.axe file: $!\n
+"; 
     while (<QIN>){ 
         if (/n01\s+nothing|n01\s+don't know|n01\s+other/i)
{
              s/\n/;nz;nosort\n/g;}      
         if (/ \s+[\w;]/){                  
              s/n01\s+/n01/g;            
              s/net1\s+/net1/g;          
              s/net2\s+/net2/g;}         
         if (/^net1misc/i){                 
              s/unl1/unl1;nz;nosort/g;}  
         unless (/^n01;c=|^net. /i){        
         print QOUT $_;             
         }                          
      }                                  
        close QOUT;
}
[download]

One final note-- another question really. When I run this with "use warnings" I get a gazillion warnings reading: "Use of uninitialized value in concatenation (.) or string at oeaxe.p line 23, <QIN> line 784." I don't have any idea what this means, and my output files look fine, so I'm considering just commenting this line out for when other people use the program. Should I be more worried about this?

Thanks as always for the enlightenment.

Pax vobiscum,

NovMonk

Comment on There is more than one way (and mine is not the best) Select or Download Code

Replies are listed 'Best First'.
Re: There is more than one way (and mine is not the best) by Belgarion (Chaplain) on May 25, 2004 at 21:22 UTC
If your code is being run in the same directory as the files you could always do something like this: `while (<*txt>) { # process the file ($_ will contain the name of the file) }` [download] This would remove the need for the "qcode" file, since it's only doing what the file system already does---namely, maintain a list of file names. As for the rest of the code: I think you're missing some braces in various places. If you tried to run your code through a good indenting text editor, you would notice that things are not lining up as you would expect. Also, doing a `perl -cw ./yourprogram.pl` says there are syntax errors with your script. Finally, I'm not sure why you're processing the file twice. I'm assuming there is a good reason, but it's not apparent to me at the moment.	[reply] [d/l]
Re: There is more than one way (and mine is not the best) by cLive ;-) (Prior) on May 25, 2004 at 22:02 UTC
"When I run this with "use warnings" I get a gazillion warnings reading: "Use of uninitialized value in concatenation (.) or string at oeaxe.p line 23, <QIN> line 784." I don't have any idea what this means. It means what it says :) To get rid of the warning, assign variables an empty string or 0 as appropriate. ie, in this instance: `# change this my ($qnum,$code,$ctext,$ntext,$stext); # to my ($qnum,$code,$ctext,$ntext,$stext)=('','','','','');` [download] Oh, and you're assigning $stext from a split. You might want to amend that too: `$stext = (split(/\s/,$_,2))[1] \|\| '';` [download] Or something like that :) To avoid the warnings, always assign an empty string as an alternative if the assignment may be undefined. cLive ;-) updated: added more explanation...	[reply] [d/l] [select]
Re: Re: There is more than one way (and mine is not the best) by revdiablo (Prior) on May 26, 2004 at 16:44 UTC
`my ($qnum,$code,$ctext,$ntext,$stext)=('','','','','');` If it were me, I'd do that like this: `$_ = '' for my ($qnum,$code,$ctext,$ntext,$stext);` Basically, I take the precept that counting things is bad. It's easy to mis-count. In this case, trying to make sure you have the same amount of empty strings on the RHS as variables on the LHS involves counting, and thus should be avoided. 8^)	[reply] [d/l] [select]
Re: There is more than one way (and mine is not the best) by punkish (Priest) on May 25, 2004 at 23:36 UTC
Updated with a little more code and comments. Is there some reason you are not using File::Find for this? I have added some sample, untested code for you below. I have no idea what your code is doing, but I hope the following will help you do it better. #!/usr/bin/perl -w use strict; use File::Find; my $dir = "/Wherever my .txt files are stored"; my ($qnum, $code, $ctext, $ntext, $stext) = ('', '', '', '', ''); find(\&processfile, ("$dir/")); sub processfile { if (/(.*)\.txt$/i) { open QIN, "$_" or die $!; open QOT, ">$1.txt" or die $!; while (<QIN>) { chomp; s/\222/'/g; s/\226/-/g; if (/^Col/i) { s/^col.//gi; s/$net$//gi; $ntext = $_; print QOT "net1$ntext (net);unl1\n"; } elsif (/^\(S/i) { (undef, $stext) = split(/\s/, $_, 2); print QOT "net2$stext (subnet);unl1\n"; } elsif (/^[0-9]/) { ($code, $ctext) = split(/\s/, $_, 2); print QOT "n01$ctext;c=(&txt($code.ge.1)\n"; } } close QIN; close QOT; open (QIN, "$1.tmp") or die $!; open (QOT, ">$1.axe") or die $!; while (<QIN>) { if (/n01\s+nothing\|n01\s+don''t know\|n01\s+other/i) { $_ =~ s/\n/;nz;nosort\n/g; } if (/\s+[\w;]/) { s/n01\s+/n01/g; s/net1\s+/net1/g; s/net2\s+/net2/g; } if (/^net1misc/i) { s/unl1/unl1;nz;nosort/g; } unless (/^n01;c=\|^net. /i) { print QOT $_; } } close QOT; } } [download]	[reply] [d/l]
Re: Re: There is more than one way (and mine is not the best) by graff (Chancellor) on May 26, 2004 at 04:43 UTC
Is there some reason you are not using File::Find for this? Well, it seems that one very good reason is that the OP doesn't show any need for recursive descent of a directory tree. All the files are in a single directory, and File::Find is overkill for that -- and somewhat harder to grok (because of all the unrelated things it does), compared to the simple glob example in the first reply, or the basic "opendir...; readdir..." functions, which are also much easier to learn than File::Find -- e.g.: `opendir( D, "." ) or die "WTF? $!"; for my $file ( grep /\.txt$/, readdir D ) { # do whatever it is the OP is doing... }` [download] For that matter, the OP didn't say one way or the other, but maybe there are subdirectories containing "*.txt" files, and he might actually prefer to leave those alone -- just process the ones in the current directory. In that case, File::Find would cause real trouble (or at least extra work to avoid trouble).	[reply] [d/l]
Re: There is more than one way (and mine is not the best) by graff (Chancellor) on May 26, 2004 at 05:29 UTC
I'm with Belgarion -- you must have made some mistakes pasting the code into your post, because there are some real problems -- you're not only missing curly bracket here or there; you also have two copies of the "open" statements for reading "$qnum.tmp" and writing "$qnum.axe" (one outside a while loop, and another inexplicably inside the loop). And you have a regex like this: `s/^col.//gi;` [download] Just a bit of a nit-pick, but when the regex is anchored to the start of the string (using "^"), the "g" modifier at the end is kinda pointless. The fact that you're using "g" everywhere makes me wonder if you understand its purpose. (browse through the "perlre" man page) The whole two-pass design is a bit mysterious -- I'd be surprised if this cannot be done in a single pass (read .txt, write .axe, without the intermediary .tmp file). It might be instructive to provide a snippet of the "txt" input, and the equivalent snippet of intended output. (And if that output was really created by a perl script, post that script.)	[reply] [d/l]


Pathologically Eclectic Rubbish Lister
	PerlMonks