New Perl User Question

Rpick has asked for the wisdom of the Perl Monks concerning the following question:

I'm a Perl Newbie, and this is my first attempt at a Perl script.

Below is the source code.
I would like to make this work with the filename to be processed as a command line arugment, however every time I've tried using the while(<>){ } I see in the Perl books, the program runs once for every line in the file.

I know this code is clunky, and probably a lot longer than it needs to be, suggestions on shortening it would also be appreciated.

#################################################### 
# InvClean.pl : Raw Scanned File Processing program 
# Version 1.0 
# Written by Robb Pickinpaugh 
# 01/31/2002 
# for use on Windows NT 
#################################################### 
use strict;
[download]

# Get Filename to process. 

my $processfilename=''; 

print "\nEnter filename to process (type exit to quit): "; 
chomp ($processfilename = <STDIN>); 

########################################### 
# 
# Setting the Rules for Processing 
# 
########################################### 

########################################### 
# 
# This sets to name of the file to which 
# the corrected data will be saved 
# 
########################################### 

my $cleanfilename = "$processfilename.clean"; 

########################################### 
# 
# This sets the numeric value 
# for the "usual" starting character 
# for each line in the raw file 
# 
########################################### 

my $correctstartchar = 16; 

############################################ 
# 
# This sets the "usual" starting length for 
# lines starting with the "usual" starting 
# character. 
# 
############################################ 

my $correctstartlength = 16; 

############################################ 
# 
# This sets the correct length of lines 
# after they have been stripped of extra 
# characters. 
# 
############################################ 

my $correctcleanlength = 13; 

############################################## 
# 
# This sets the length of lines that do not 
# include the extra stop and start characters 
# that are sometimes included in scanned data 
# 
############################################## 

my $typedlength = 14; 

############################################### 
# 
# Do not change these values, they are used to 
# report the number of lines read, and written 
# 
############################################### 

my $rawfilelength = 0; 
my $cleanfilelength = 0; 

########################### 
# 
# Call Processing Routine 
# 
########################### 

&ProcessFile; 

############################################# 
# 
# Report number of lines read from raw file, 
# and written to "cleaned" file. 
# 
############################################# 

print "$rawfilelength lines read from $processfilename\n"; 
print "$cleanfilelength lines written to $cleanfilename\n"; 

################################# 
# 
# Actual Processing of the File 
# 
################################# 

sub ProcessFile { 
my $data=''; 
my $datalength=0; 
my $startchar=''; 
open (RAWFILE, "$processfilename") || die "cannot open: $!"; 
open (CLEANFILE, ">$cleanfilename") || die "cannot open: $!"; 
while (<RAWFILE>){ 
$rawfilelength++; 
$data = $_; 
$datalength = length($data); 
$startchar = ord($data); 
if ($startchar == $correctstartchar){ 
if($datalength == $correctstartlength){ 
chomp $data; 
chop $data; 
$data = reverse ($data); 
chop $data; 
$data = reverse ($data); 
}else{ 
next; 
} 
if (length($data) == $correctcleanlength){ 
print CLEANFILE "$data\n"; 
$cleanfilelength++; 
} 
}elsif ($datalength == $typedlength){ 
print CLEANFILE "$data"; 
$cleanfilelength++; 
}elsif ($datalength > $correctcleanlength) { 
my $datalengthtrack = $datalength; 
chomp $data; 
$datalengthtrack--; 
chop $data; 
$datalengthtrack--; 
$data = reverse ($data); 
while ($datalengthtrack > $correctcleanlength){ 
chop $data; 
$datalengthtrack--; 
} 
$data = reverse ($data); 
print CLEANFILE "$data\n"; 
$cleanfilelength++; 
}elsif ($datalength < $correctcleanlength) { 
next; 
} 


} 
close (RAWFILE) || die "cannot close $processfilename: $!"; 
close (CLEANFILE) || die "cannot close $cleanfilename: $!"; 
} 
print "\a"; 
exit(0);
[download]

Comment on New Perl User Question Select or Download Code

Replies are listed 'Best First'.
Re: New Perl User Question by BazB (Priest) on Feb 01, 2002 at 21:35 UTC
This is fairly straightforward - you might want to use Super Search for other examples. Commandline arguments are available in the array `@ARGV` Try something like this: `#!/usr/bin/perl -w use strict; my $in_file = shift @ARGV; open(INFILE, "$in_file") or die "Can't open input file!: $!\n"; while (<INFILE>) { # do stuff with each line of INFILE until # there are no more lines to process } close(INFILE);` [download] Hope that helps. BazB.	[reply] [d/l] [select]
Re: Re: New Perl User Question by Rpick (Novice) on Feb 04, 2002 at 14:19 UTC
BazB, Thanks a lot, that did exactly what I was looking for. I guess I missed the need for the shift before the @ARGV. Thanks again	[reply]
Re: New Perl User Question by screamingeagle (Curate) on Feb 01, 2002 at 21:40 UTC
you could also use the following modules to help with parsing command-line parameters : a) GetOpt::Std b) GetOpt::Long in case u decide to extend your programs with additional command-line parameters, and/or u need to ensure that the correct data types are being passed via the command line , the modules mentioned above should come in handy	[reply]
Re: New Perl User Question by sparkyichi (Deacon) on Feb 01, 2002 at 21:44 UTC
To pass a file from the command line use `@ARGV` instead of `<STDIN>`. So your code: `my $processfilename=''; print "\nEnter filename to process (type exit to quit): "; chomp ($processfilename = <STDIN>);` [download] Could be: `my $processfilename=$ARGV[0];` [download] Sparky FMTEYEWTK	[reply] [d/l] [select]
Re: New Perl User Question by CharlesClarkson (Curate) on Feb 02, 2002 at 13:10 UTC
I know this code is clunky, and probably a lot longer than it needs to be, suggestions on shortening it would also be appreciated. Careful what you wish for: Take a look at perlstyle. It is included with the standard perl distribution. One style rule mentions the use of the underscore `_` to separate the words in variable names. This makes reading variables faster and easier especially for non-native speakers of English. I also prefer to avoid mixed case variables and subroutine names to avoid miss-typing. Other style rules mentioned in perlstyle include: always use spaces around operators, use 4-spaces as tabs, and add a space after commas. These rules are not set in concrete. The best thing is find your style, compare it with that of others and then be consistent. Keeping this in mind, We can apply BazB's advice: `my $process_file_name = shift @ARGV;` Read more... (10 kB)	[reply] [d/l] [select]
Re: Re: New Perl User Question by Rpick (Novice) on Feb 04, 2002 at 22:18 UTC
Charles, Thanks for the warning on being careful what I wish. Actually what you showed me was exactly what I was looking for. My programming background is in C++, and a bit of VB, and VBA. I'm just starting out in perl, and wasn't aware of functions like substr. That was exactly the kind of explanation I was looking for. Thanks again.	[reply]
Re: New Perl User Question by talexb (Chancellor) on Feb 02, 2002 at 04:34 UTC
I'd like to suggest formatting more like this for your code. Code formatting is a religious topic, so let me just walk around the topic gingerly by suggesting that you want to make the code as readable as possible so that it's easy to maintain, easy to document, and easy to debug. Never assume that once you write a piece of code you're never going to have to deal with it again. Unless it's a one-liner, you're probably going to have to go back to it. Read more... (2 kB)	[reply] [d/l]
Re: New Perl User Question by edebill (Scribe) on Feb 02, 2002 at 18:24 UTC
people seem to have missed the rather idiomatic: `while(my $data = <>){ process the line of data }` [download] `<>` in a while like this is a special case, and will automatically open any files listed on the command line, or read from standard input. You should be able to save out `$processfilename=$ARGV[0];` beforehand for use in making your $cleanfilename. Using this construct saves you from needing to manually open and close the file. Using `my` where you first use your variables is more readable than "declaring" them beforehand. This is one of the shortcomings of C, that every language since seems to have gone out of their way to overcome. A little indentation would also make your code a little easier to parse visually :-) Oh, and you might not want to unconditionally decrement $datalengthtrack after a chomp(). Chomp doesn't always remove characters, so depending on the input dataset, you might get errors. It DOES however return the number of chars removed, so you can capture that info and use it (`$datalengthtrack -= chomp($data);` or the like)	[reply] [d/l] [select]
Re: New Perl User Question by grinder (Bishop) on Feb 02, 2002 at 21:26 UTC
My one word of advice would be to Ditch those comments. Seriously. They distract from understanding the code. They will slowly drift out of sync. If you need to comment what purpose a variable serves, you have named it poorly. -- `g r i n d e r print@_{sort keys %_},$/if%_=split//,'= & *a?b:e\f/h^h!j+n,o@o;r$s-t%t#u';`	[reply]


Perl-Sensitive Sunglasses
	PerlMonks

New Perl User Question

Careful what you wish for:

g r i n d e r print@_{sort keys %_},$/if%_=split//,'= & *a?b:e\f/h^h!j+n,o@o;r$s-t%t#u';

`g r i n d e r print@_{sort keys %_},$/if%_=split//,'= & *a?b:e\f/h^h!j+n,o@o;r$s-t%t#u';`