Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

I know this code is clunky, and probably a lot longer than it needs to be, suggestions on shortening it would also be appreciated.

Careful what you wish for:

Take a look at perlstyle. It is included with the standard perl distribution. One style rule mentions the use of the underscore _ to separate the words in variable names. This makes reading variables faster and easier especially for non-native speakers of English. I also prefer to avoid mixed case variables and subroutine names to avoid miss-typing.

Other style rules mentioned in perlstyle include: always use spaces around operators, use 4-spaces as tabs, and add a space after commas. These rules are not set in concrete. The best thing is find your style, compare it with that of others and then be consistent.

Keeping this in mind, We can apply BazB's advice:
my $process_file_name = shift @ARGV;
And reducing the comments some:
# file name for corrected data my $clean_file_name = "$processfilename.clean"; # numeric value of the "usual" line starting character my $correct_start_char = 16; # "usual" starting length of lines starting with $correct_start_char my $correct_start_length = 16; # correct length of lines after they have been stripped my $correct_clean_length = 13; # length of lines that do not include the extra stop and start charact +ers my $typed_length = 14; # Do not change these values, they are used to # report the number of lines read, and written my $raw_file_length = 0; my $clean_file_length = 0;

You might take alook at perlsub. It is also included with the standard perl distribution and gives some excellent reasons for not prefixing subroutine calls with the ampersand &. One more rule you might adopt is to avoid using global values in subroutines. When possible attempt to pass all the needed variables to a sub and return expected values from the sub. This makes the sub more portable and the program easier to maintain.

With that in mind, we could change:
&ProcessFile; to:
my($raw_file_length, $clean_file_length) = process_file( $process_file_name, $clean_file_name, $correct_start_char, $correct_start_length, $correct_clean_length, $typed_length);
All this might be shortened, without a loss of clarity with an array:
my @args = ( $ARGV[0], # file to process "$ARGV[0].clean", # processed file 16, # numeric value of the "usual" # line starting character 16, # "usual" starting length of lines # starting with $correct_start_char 13, # correct length of lines after they # have been stripped 14 # length of lines that do not include # the extra stop and start characters ); my($raw_file_length, $clean_file_length) = process_file( @args );

NOTE: We could pass @args with a reference, but I'm not willing to add too many new concepts at one time. perlref and perlsub have more information.

Before we examine the subroutine, let's add the end of the program:
print "$raw_file_length lines read from $args[0]\n"; print "$clean_file_length lines written to $args[1]\n"; print "\a"; exit(0);

For the subroutine, we'll examine the logic of the if/else and we'll use the substr function in place of the chomp/chop method you used.

chomp $data; chop $data; $data = reverse ($data); chop $data; $data = reverse ($data);
Could be replaced with:
$data = substr($data, 1, -2); And:
chomp $data; $datalengthtrack--; chop $data; $datalengthtrack--; $data = reverse ($data); while ($datalengthtrack > $correctcleanlength){ chop $data; $datalengthtrack--; } $data = reverse ($data); print CLEANFILE "$data\n";
Could be replaced with:
print CLEANFILE substr( substr($data, 0, -2), -$correct_clean_length), "\n"; The if/else block might be re-written as:
if ( $start_char == $correct_start_char ) { next if $data_length != $correct_start_length; if ( $data_length == $correct_start_length ) { print CLEANFILE substr($data, 1, -2), "\n"; $clean_file_length++; } } else { if ( $data_length > $correct_clean_length ) { print CLEANFILE substr( substr($data, 0, -2), -$correct_clean_ +length), "\n"; $clean_file_length++; } elsif ( $data_length == $typed_length ) { print CLEANFILE $data; $clean_file_length++; } }
We'll have to retrieve those variables we passed to the sub:
my ( $file_name, $clean_file_name, $correct_start_char, $correct_start_length, $correct_clean_length, $typed_length) = + @_;
And I took the liberty of shortening the file handle names:
open RAW, $file_name or die "Cannot open $file_name: $!"; open CLEAN, ">$clean_file_name" or die "Cannot open $clean_file_name: +$!";
And since we were actually keeping track of lines not file lengths, I used:
my($raw_lines, $clean_lines); in the sub, instead of:
my($raw_file_length, $clean_file_length) Putting it all together:
#!/usr/bin/perl #################################################### # InvClean.pl : Raw Scanned File Processing program # Version 1.0 # Written by Robb Pickinpaugh # 01/31/2002 # for use on Windows NT #################################################### use strict; use warnings; my @args = ( $ARGV[0], # file to process "$ARGV[0].clean", # processed file 16, # numeric value of the "usual" # line starting character 16, # "usual" starting length of lines # starting with $correct_start_char 13, # correct length of lines after they # have been stripped 14 # length of lines that do not include # the extra stop and start characters ); my($raw_file_length, $clean_file_length) = process_file( @args ); print "$raw_file_length lines read from $args[0]\n"; print "$clean_file_length lines written to $args[1]\n"; print "\a"; exit(0); sub process_file { my ( $file_name, $clean_file_name, $correct_start_char, $correct_start_length, $correct_clean_length, $typed_length) = + @_; open RAW, $file_name or die "Cannot open $file_name: $! +"; open CLEAN, ">$clean_file_name" or die "Cannot open $clean_file_na +me: $!"; my ($raw_lines, $clean_lines); while ( my $line = <RAW> ) { my $line_length = length $line; my $start_char = ord $line; if ( $start_char == $correct_start_char ) { next if $line_length != $correct_start_length; if ( $line_length == $correct_start_length ) { print CLEAN substr($line, 1, -2), "\n"; $clean_lines++; } } else { if ( $line_length > $correct_clean_length ) { print CLEAN substr( substr($line, 0, -2), -$correct_cl +ean_length), "\n"; $clean_lines++; } elsif ( $line_length == $typed_length ) { print CLEAN $line; $clean_lines++; } } $raw_lines = $.; } close RAW or die "cannot close $file_name: $!"; close CLEAN or die "cannot close $clean_file_name: $!"; return ($raw_lines, $clean_lines); } __END__



HTH,
Charles K. Clarkson
HTH, Charles K. Clarkson Clarkson Energy Homes, Inc.

In reply to Re: New Perl User Question by CharlesClarkson
in thread New Perl User Question by Rpick

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (1)
As of 2024-04-25 02:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found