comment on

I know this code is clunky, and probably a lot longer than it needs to be, suggestions on shortening it would also be appreciated.

Careful what you wish for:

Take a look at perlstyle. It is included with the standard perl distribution. One style rule mentions the use of the underscore _ to separate the words in variable names. This makes reading variables faster and easier especially for non-native speakers of English. I also prefer to avoid mixed case variables and subroutine names to avoid miss-typing.

Other style rules mentioned in perlstyle include: always use spaces around operators, use 4-spaces as tabs, and add a space after commas. These rules are not set in concrete. The best thing is find your style, compare it with that of others and then be consistent.

Keeping this in mind, We can apply BazB's advice:
my $process_file_name = shift @ARGV;

And reducing the comments some:

# file name for corrected data
my $clean_file_name = "$processfilename.clean";

# numeric value of the "usual" line starting character
my $correct_start_char = 16;

# "usual" starting length of lines starting with $correct_start_char
my $correct_start_length = 16;

# correct length of lines after they have been stripped
my $correct_clean_length = 13;

# length of lines that do not include the extra stop and start charact
+ers
my $typed_length = 14;

# Do not change these values, they are used to
# report the number of lines read, and written
my $raw_file_length     = 0;
my $clean_file_length   = 0;
[download]

You might take alook at perlsub. It is also included with the standard perl distribution and gives some excellent reasons for not prefixing subroutine calls with the ampersand &. One more rule you might adopt is to avoid using global values in subroutines. When possible attempt to pass all the needed variables to a sub and return expected values from the sub. This makes the sub more portable and the program easier to maintain.

With that in mind, we could change:
&ProcessFile; to:

my($raw_file_length, $clean_file_length) =
    process_file(
        $process_file_name, $clean_file_name, $correct_start_char,
        $correct_start_length, $correct_clean_length, $typed_length);
[download]

All this might be shortened, without a loss of clarity with an array:

my @args = (
        $ARGV[0],           # file to process
    
        "$ARGV[0].clean",   # processed file
    
        16,                 # numeric value of the "usual"
                            # line starting character
    
        16,                 # "usual" starting length of lines
                            # starting with $correct_start_char
    
        13,                 # correct length of lines after they
                            # have been stripped
    
        14                  # length of lines that do not include
                            # the extra stop and start characters
    );


my($raw_file_length, $clean_file_length) = process_file( @args );
[download]

NOTE: We could pass @args with a reference, but I'm not willing to add too many new concepts at one time. perlref and perlsub have more information.

Before we examine the subroutine, let's add the end of the program:

print "$raw_file_length lines read from $args[0]\n";
print "$clean_file_length lines written to $args[1]\n";

print "\a";
exit(0);
[download]

For the subroutine, we'll examine the logic of the if/else and we'll use the substr function in place of the chomp/chop method you used.

chomp $data; 
chop $data; 
$data = reverse ($data); 
chop $data; 
$data = reverse ($data);
[download]

Could be replaced with:
$data = substr($data, 1, -2); And:

chomp $data; 
$datalengthtrack--; 
chop $data; 
$datalengthtrack--; 
$data = reverse ($data); 
while ($datalengthtrack > $correctcleanlength){ 
    chop $data; 
    $datalengthtrack--; 
} 
$data = reverse ($data); 
print CLEANFILE "$data\n";
[download]

Could be replaced with:
print CLEANFILE substr( substr($data, 0, -2), -$correct_clean_length), "\n"; The if/else block might be re-written as:

if ( $start_char == $correct_start_char ) {
    next if $data_length != $correct_start_length;
    if ( $data_length == $correct_start_length ) {
        print CLEANFILE substr($data, 1, -2), "\n";
        $clean_file_length++;
    }
} else {
    if ( $data_length > $correct_clean_length ) {
        print CLEANFILE substr( substr($data, 0, -2), -$correct_clean_
+length), "\n";
        $clean_file_length++;
    } elsif ( $data_length == $typed_length ) {
        print CLEANFILE $data;
        $clean_file_length++;
    }
}
[download]

We'll have to retrieve those variables we passed to the sub:

my (  $file_name, $clean_file_name, $correct_start_char,
        $correct_start_length, $correct_clean_length, $typed_length) =
+ @_;
[download]

And I took the liberty of shortening the file handle names:

open RAW,   $file_name          or die "Cannot open $file_name: $!";
open CLEAN, ">$clean_file_name" or die "Cannot open $clean_file_name: 
+$!";
[download]

And since we were actually keeping track of lines not file lengths, I used:
my($raw_lines, $clean_lines); in the sub, instead of:
my($raw_file_length, $clean_file_length) Putting it all together:

#!/usr/bin/perl
####################################################
# InvClean.pl : Raw Scanned File Processing program
# Version 1.0
# Written by Robb Pickinpaugh
# 01/31/2002
# for use on Windows NT
####################################################
use strict;
use warnings;

my @args = (
        $ARGV[0],           # file to process

        "$ARGV[0].clean",   # processed file

        16,                 # numeric value of the "usual"
                            # line starting character

        16,                 # "usual" starting length of lines
                            # starting with $correct_start_char

        13,                 # correct length of lines after they
                            # have been stripped

        14                  # length of lines that do not include
                            # the extra stop and start characters
    );


my($raw_file_length, $clean_file_length) = process_file( @args );

print "$raw_file_length lines read from $args[0]\n";
print "$clean_file_length lines written to $args[1]\n";

print "\a";
exit(0);

sub process_file {
    my (
        $file_name, $clean_file_name, $correct_start_char,
        $correct_start_length, $correct_clean_length, $typed_length) =
+ @_;

    open RAW,   $file_name          or die "Cannot open $file_name: $!
+";
    open CLEAN, ">$clean_file_name" or die "Cannot open $clean_file_na
+me: $!";

    my ($raw_lines, $clean_lines);
    while ( my $line = <RAW> ) {
        my $line_length = length $line;
        my $start_char = ord $line;
        if ( $start_char == $correct_start_char ) {
            next if $line_length != $correct_start_length;
            if ( $line_length == $correct_start_length ) {
                print CLEAN substr($line, 1, -2), "\n";
                $clean_lines++;
            }
        } else {
            if ( $line_length > $correct_clean_length ) {
                print CLEAN substr( substr($line, 0, -2), -$correct_cl
+ean_length), "\n";
                $clean_lines++;
            } elsif ( $line_length == $typed_length ) {
                print CLEAN $line;
                $clean_lines++;
            }
        }
        $raw_lines = $.;
    }

    close RAW   or die "cannot close $file_name: $!";
    close CLEAN or die "cannot close $clean_file_name: $!";

    return ($raw_lines, $clean_lines);
}

__END__
[download]

HTH,
Charles K. Clarkson
HTH, Charles K. Clarkson Clarkson Energy Homes, Inc.

In reply to Re: New Perl User Question by CharlesClarkson
in thread New Perl User Question by Rpick

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


Problems? Is your data what you think it is?
	PerlMonks