Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Critique of some perl code.

by jwkrahn (Abbot)
on Apr 17, 2022 at 07:03 UTC ( [id://11143017]=perlmeditation: print w/replies, xml ) Need Help??

Hello Everyone,

I occasionally like to download perl programs off the web to see how they work. And if there are any bugs I try to report back to the original author.

This code was found verbatim (except for the file name) in two different files.

my $conf = which_conf($configuration_file); if (-f $conf) { my @conf_args; open(my $conffile, '<', "$conf") or warn "$P: Can't find a readable $configuration_file file $! +\n"; while (<$conffile>) { my $line = $_; $line =~ s/\s*\n?$//g; $line =~ s/^\s*//g; $line =~ s/\s+/ /g; next if ($line =~ m/^\s*#/); next if ($line =~ m/^\s*$/); my @words = split(" ", $line); foreach my $word (@words) { last if ($word =~ m/^#/); push (@conf_args, $word); } } close($conffile); unshift(@ARGV, @conf_args) if @conf_args; }

my $conf = which_conf($configuration_file); if (-f $conf) { my @conf_args; open(my $conffile, '<', "$conf") or warn "$P: Can't find a readable $configuration_file file $! +\n";

It looks like there is a race condition between the file test and the file open.

The file test does not print a message if the file is not found but the file open does.

If the file cannot be opened a message is printed and then another message is printed when readline tries to read from an invalid file handle.

The file name is copied to a string.


while (<$conffile>) { my $line = $_;

Data is assigned to $_ and then copied to $line but $_ is never used inside the loop.


$line =~ s/\s*\n?$//g;

This is the line that gobsmacked me. I ran some tests with Benchmark and re 'debug' compared to the FAQ answer "s/\s+$//;" and it turns out that this is a good example of what not to do.

This substitution was found in five places in one file and two places in the other file.


$line =~ s/^\s*//g;

Not as bad as the previous line but still not as good as the FAQ answer.


$line =~ s/\s+/ /g;

Correct code, nothing to see here.


next if ($line =~ m/^\s*#/); next if ($line =~ m/^\s*$/);

Why check for whitespace after you've just removed it?


my @words = split(" ", $line); foreach my $word (@words) { last if ($word =~ m/^#/); push (@conf_args, $word); }

This code is where the work is done and obviates the need for the previous five lines. split(" ", $line) removes all whitespace. foreach will not loop if @words is empty. $word =~ m/^#/ will skip to the next line if a '#' comment is found.


close($conffile);

Correct code, nothing to see here.


unshift(@ARGV, @conf_args) if @conf_args;

The conditional is not really required.


Where I found this code:

Replies are listed 'Best First'.
Re: Critique of some perl code.
by aitap (Curate) on Apr 17, 2022 at 07:50 UTC
    The good news is, fixing issues like this one is as easy as sending an e-mail (like this shell injection in perf which broke some of its functionality on binaries with spaces in path). Simple, well-explained patches are just accepted with no fuss. Then, a year later, a recruiter will try to contact you about some kernel work without even looking at your patches, but that's just life.
Re: Critique of some perl code.
by hv (Prior) on Apr 17, 2022 at 17:29 UTC

    To be honest, I don't consider this code that bad. Yes, there are some small issues - they mainly suggest the author is not a perl expert - but it seems unlikely that any of them would affect correct behaviour or cause significant performance issues in the context this code is likely to run.

    My first concerns would rather be with two things you don't mention: that on failing to open the file for reading (most likely due to permissions), it fails to mention the name of the file it tried to open; and that it stuffs @ARGV with an arbitrarily long list of the words it found, which feels like a possible design flaw.

    That last, however, suggests this configuration file is expected rarely to be more than a couple of lines long, so performance of the processing loop is really not likely to be very relevant.

    Data is assigned to $_ and then copied to $line but $_ is never used inside the loop.

    This is pretty reasonable if as a non-expert you are vaguely aware that while ($line = <$fh>) has some differences around termination conditions compared to while (<$fh>).

    This is the line that gobsmacked me. I ran some tests with Benchmark and re 'debug' compared to the FAQ answer "s/\s+$//;" and it turns out that this is a good example of what not to do.

    And how much CPU time do you think replacing it would save over the lifetime of the universe? I suspect the total would sit comfortably under a second, maybe even under a millisecond.

      Doing a while where the conditional is a function of a filehandle iterator (or a couple other iterater-y things) has implicitly done the defined test whether or not there's an assignment to a variable for a while now.

      If the condition expression of a "while" statement is based on any of a group of iterative expression types then it gets some magic treatment. The affected iterative expression types are "readline", the "<FILEHANDLE>" input operator, "readdir", "glob", the "<PATTERN>" globbing operator, and "each". If the condition expression is one of these expression types, then the value yielded by the iterative operator will be implicitly assigned to $_. If the condition expression is one of these expression types or an explicit assignment of one of them to a scalar, then the condition actually tests for definedness of the expression's value, not for its regular truth value.

      The cake is a lie.
      The cake is a lie.
      The cake is a lie.

        [...] has implicitly done the defined test [...] for a while now

        Do you know which version introduced that?

        Alexander

        --
        Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
      This is pretty reasonable if as a non-expert you are vaguely aware that while ($line = <$fh>) has some differences around termination conditions compared to while (<$fh>).

      What "termination conditions" are different between them?

        I was also puzzled by the differing "termination conditions." I don't see them.


        Give a man a fish:  <%-{-{-{-<

        I think it's no longer true - and I'm struggling to remember how to test it - but I'm pretty sure there used to be the distinction that while (<>) was special-cased to act as while (defined($_ = <>)), while anything more complicated (such as while ($line = <>)) did not get the "defined" check added to it. So if <> were to return something like an empty string, or "0", the latter case could terminate early.

        The oldest perl I have to hand is a maint-5.8, and I couldn't reproduce it there with perl -e 'print "0\n0"' | perl -we 'while ($r = <>) { printf "%s", $r }' -, so either I'm testing it wrong, or it affected only even earlier perls, or I'm going senile. (These options are not necessarily exclusive.)

        Update but the point was that it could be written by someone with a vague memory that something like that could happen - much like I have - and if so, that didn't necessarily make it awful code.

      I agree completely.

      I just felt like venting a bit. :)

Re: Critique of some perl code.
by jdporter (Paladin) on Apr 18, 2022 at 16:30 UTC
    unshift @ARGV, (sub{ my $fn = shift; open my $fh, '<', $fn or return() +; grep !/^#/, map split(), grep !/^#/, <$fh>; })->($conf);

    Besides what others have already said, one observation I'd make is that "comments" might behave surprisingly for people not familiar with the pecularities of this format. In particular, "comments" as such must be on a line by themselves. If a # char is used to "comment out" content within a line, it will only cause the adjacent text to be ignored, not the entire remainder of the line. A line like text #not included will yield the terms text included.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://11143017]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others about the Monastery: (6)
As of 2024-04-19 11:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found