http://qs321.pair.com?node_id=55640
Category: Text Processing
Author/Contact Info Colin McMillen (Falkkin)
Description: This program takes in files on the command-line and counts the lines of code in each, printing a total when finished.

My standard for counting lines of code is simple. Every physical line of the file counts as a logical line of code, unless it is composed entirely of comments and punctuation characters. Under this scheme, conditionals count as separate lines of code. Since it is often the case that a decent amount of the code's actual logic takes place within a conditional, I see no reason to exclude conditionals from the line-count.

Usage: code_counter.pl [-v] [filenames]

The -v switch makes it output verbosely, with a + or - on each line of code based on whether it counted that line as an actual line of code or not.

#!/usr/bin/perl -w
use strict;

my $total_lines = 0;   # Contains the total number of lines in all the
+ files supplied.
my $logical_lines = 0; # Contains the number of logical lines of code 
+in one file.
my $in_pod = 0;        # If set to 1, indicates that we are currently 
+in the middle of
                       # a POD comment and hence shouldn't be counting
+ lines of code.

# If the first argument specified on the command-line is '-v', we want
+ verbose output.
my $verbose = 0;
my $first_arg = $ARGV[0];

if ($first_arg eq "-v") {
  shift @ARGV;         # Remove the first element of the @ARGV array.
  $verbose = 1;
}

# For each of the filenames supplied on the command-line, execute the 
+following block,
# which will count up the logical lines in that file.
foreach my $file (@ARGV) {

  # Re-initialize variables to zero.
  $logical_lines = 0;
  $in_pod = 0;

  # Open the file for reading. Die if there is some kind of error.
  open(FILE, $file) or die "Could not open $file: $!";

  # Process each of the lines of the file, in sequence.
  foreach (<FILE>) {

    # A line beginning with '=cut' marks the end of a POD block.
    reject($_), $in_pod = 0, next if /^=cut/i;

    # If we are in a POD block, this is not a line of code.
    reject($_), next if $in_pod;

    # A line beginning with the '=' character marks the beginning of a
+ POD block.
    reject($_), $in_pod = 1, next if /^=/i;

    # A line containing no alphanumerics is not counted as a line of c
+ode.
    reject($_), next unless /\w/;

    # A line beginning with whitespace and followed by a pound charact
+er is a comment.
    reject($_), next if /^\s*#/;

    # If we got here, this line is actually a line of code.
    acknowledge($_);
  }

  # Close the file.
  close(FILE);

  # Print the results for this file and add to $total_lines.
  print "$file contains $logical_lines logical lines of code.\n";
  $total_lines += $logical_lines;
}

# Print the final result.
print "$total_lines logical lines of code in ", scalar(@ARGV), " files
+.\n";

# The function that gets called whenever we determine that
# the current line is a logical line of code.
sub acknowledge {
  my $line = shift;
  $logical_lines++;
  print "+ $line" if $verbose;
}

# The function that gets called whenever we determine that
# the current line isn't a logical line of code.
sub reject {
  my $line = shift;
  print "- $line" if $verbose;
}
Replies are listed 'Best First'.
Re: Code counter
by salvadors (Pilgrim) on Feb 01, 2001 at 18:59 UTC

    If you're planning to read through a list of files from the command line, then you should really use <> - it saves you all the work of opening all the files etc..

    And, rather than all that work with $in_pod, I'd just use the range operator.

    A line containing no alphanumerics is not counted as a line of code

    Not sure I'd agree with that. Haven't you seen some of the JAPHs posted :) I'd personally check for absence of non-whitespace.

    So, personally, I'd reduce all this down to:

    #!/usr/bin/perl -w use strict; my $total = my $pod = my $comment = my $blank = my $code = 0; while(<>) { $total++; if ( /^=\w+/ .. /^=cut/ ) { $pod++; next } if ( /^\s*#/ ) { $comment++; next } unless ( /\S/ ) { $blank++; next } $code++; } print <<END; Total: $total POD: $pod Comments: $comment Blank: $blank Code: $code END

    Tony

      verry nice.. I always just do a lazy aproach: cat *.pl | perl -ne 'print unless /^\s*$|^#/;' | wc -l could integrate the pod match and do a lot more with this.. but I am lazy :-P
Re: Code counter
by ichimunki (Priest) on Feb 01, 2001 at 20:35 UTC
    I have to ask, not to be nitpicky, but because I am curious about what the normal sense of a line is when counting lines of code...

    Are these one, two, or more lines of code, and why?
    my $var = Package::Module->get_value or die "Fatal error: $!"; my $var = { 'List' => [ 'Foo', 'Bar', 'Batz' ], 'List Two' => [ 'Phu', 'Bah', 'Buzz' ] };

      The first is 2 lines. The 2nd is 5 lines.

      This is purely in the "wc -l" sense. If someone chooses to make any futher assumptions on what a certain number of lines of code actually *means*, well, there be dragons.

      But, as with many things, not being able to precisely define an exact 100% accurate all the time valid reason for something doesn't meant that it can't be useful.

      I like to measure the number of lines of code in our system over time. I don't really make any important decisions based solely on that number, but it's still useful to watch. Yes, changes in coding style can have an impact on that, but I can bear things like that in mind.

      It's like benchmarking. Knowing the raw speed of something tells you very little most of the time, just as knowing how many lines of code you have in a system tells you little. But telling how things move over time, or compare to other things, starts making it more useful.

      Tony