http://qs321.pair.com?node_id=11102051

Hello everyone.

I wrote the following subroutine recently using map and grep in the BLOCK LIST style. Line 4 had gotten long, so as you can see, I put the map, grep, and sort on separate lines. After doing that and seeing how good it looked, I began thinking about all the times I used map and grep in the EXPR, LIST format and that I may want to rewrite all of them.

Most, if not all, other people I have chatted with over my years here use the BLOCK LIST format instead of the EXPR, LIST format. I stubbornly stuck with the EXPR, LIST format. Now I am not sure if I am thinking of refactoring to use the format is normally used or if this is refactoring just to refactor. So, I am on the fence on whether or not I should do it.

What do you think?

sub index_menu { my $dir = shift; my @file_list = file_list('.'); my $files = [ map { anchor( textify($_), { href => $_ } ) } grep { /^[A-Z].+/ && -f $_ } sort { article_sort($a, $b) } @file_list ]; return $files; }
No matter how hysterical I get, my problems are not time sensitive. So, relax, have a cookie, and a very nice day!
Lady Aleena

Replies are listed 'Best First'.
Re: Refactoring just to refactor?
by davido (Cardinal) on Jun 28, 2019 at 05:19 UTC

    The formatting is fine, but to me reformatting is not refactoring. Refactoring is using different programming constructs, redesigning abstractions, splitting apart subroutines that grew too big, too hard to test, have too many side effects, or are doing too many things, factoring out common code into base classes, helper subs or modules, shifting over to more appropriate algorithms, that sort of thing. Reformatting is just reformatting; a change in whitespace is inconsequential except to the people who have to read the code. Reformatting is not refactoring.

    When to reformat? When you think you've found a formatting technique that will make the code more legible. But in a world with version control, beware that simply deleting a trailing space will mostly just serve to add nose to pull requests making it harder to see the forest through the trees. And will possibly mislead people who come along in the future looking at the code to believing that you modified it when it was only reformatted. For example, I occasionally have to do some large scale moving of code from one layer to another, and in so doing suddenly find my name next to the git blame. A year later someone comes along and asks me questions about it thinking I wrote it. This risk makes sense for substantive changes, but probably doesn't make sense if it's just a matter of tidying up whitespace.

    On the other hand, if tidying up whitespace seems like something you wish to do, do it as separate commits from any substantive changes, so that it's easy to identify the important things that changed, versus the shape of the code's formatting.


    Dave

      ++

      I fully subscribe to that. Well said!

      One more note I cannot resist here: reformatting/restyling sometimes "improves" readability for some where it "obfuscates" for others. When in a team get consensus.

      Some changes might yield the same underlying code and thus seem to be just styling:

      $ perl -MO=Deparse -e'$a = 1 unless $b;' $a = 1 unless $b; -e syntax OK $ perl -MO=Deparse -e'$b or $a = 1;' $a = 1 unless $b; -e syntax OK

      but where some see a change from first to second or vice versa as a readability improvement, others might percieve this as plain obfuscation. Not all minds work alike.


      Enjoy, Have FUN! H.Merijn
        When in a team get consensus.

        And with that in mind, use a .perltidyrc and hook it into the commit workflow. Do this even when not in a team for (a) practice and (b) self-enforced consistency. It then makes later global reformatting a doddle too.

        I agree with everything davido said and the other replies to him.

      I strongly agree with your advice to do code tidying as a separate commit from any substantive changes. I often tidy as I work on code, but using Mercurial and TortoiseHg I can cherry pick commits so it is generally easy to pick up all the "tidying" related changes and commit those then preform the "substantive changes" commit. Indeed, I moderately often cherry pick commits for substantive changes so I can highlight parts of a piece of work.

      I avoid Git to the extent possible so I don't know what it offers for cherry picking changes, but I'll bet there is something in its slew of cobbled together tools that allows identifying and selecting changes to commit.

      Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
        I avoid Git to the extent possible so I don't know what it offers for cherry picking changes
        It's called
        git cherry-pick
        :-P

      Sorry for the misnomer of the original post. I will keep this in mind if I use version control in the future.

      No matter how hysterical I get, my problems are not time sensitive. So, relax, have a cookie, and a very nice day!
      Lady Aleena
Re: Refactoring just to refactor?
by Your Mother (Archbishop) on Jun 28, 2019 at 01:21 UTC

    Refactoring can be pure fun. It can also be helpful for learning or reorganizing and making things easier to maintain. Maybe it's time to bite off a bigger piece and do more of your stuff in a template as recently revisited. :P

    #!/usr/bin/env perl use utf8; use 5.14.2; # for s///r use strictures; use Template; use Path::Tiny; # Mocking… you'd replace with your actual routine. sub Path::Tiny::textify { # Monkey patch. my $self = shift; join " ", map ucfirst, split /[\W_]/, $self->basename =~ s/(?<=\w)\.\w{1,3}\z//r; } my $tt2 = Template->new; my @files = grep $_->is_file, path(".")->children( qr/\A[A-Z].+/i ); # Should probably have \z t +oo. $tt2->process(\*DATA, { files => \@files }) or warn $Template::ERROR; __DATA__ <ul> [%-FOR file IN files %] <li> <a href="./[% file.basename | uri %]">[% file.textify | html %]< +/a> </li> [%-END %] </ul>
    <ul> <li> <a href="./WADO-OsiriX-notes.txt">WADO OsiriX Notes</a> </li> <li> <a href="./chinese-truncate.pl">Chinese Truncate</a> </li> <li> <a href="./Dockerfile">Dockerfile</a> </li> …etc… </ul>
Re: Refactoring just to refactor?
by choroba (Cardinal) on Jun 28, 2019 at 14:16 UTC
    I still use the expression style where possible. The recommendation in Perl Best Practices makes several arguments in favour of the block form, but I don't find them convincing:

    A block stands out more clearly.
    It probably does, but when the expression is simple enough, e.g. just /$pattern/, I don't see why a scope around it is needed.
    It avoids mistakes when a function in the expression eats the list as arguments.
    The mistake is either recognised by Perl itself, giving a compile-time error like Too many arguments for substr, or by a test.
    It saves you time when you later need to refactor the expression into something more complex.
    Adding a { and replacing a , with } doesn't take so much time. Also, many simple expressions stay simple.

    Regarding refactoring for refactoring: I do it sometimes to my personal projects. It makes me more familiar with the code again, I touch parts I haven't seen for years, which means I can find and fix bugs faster when needed. Also, it makes me feel comfortable when reading the code or showing it to someone else. At work where the codebase is huge and changes must be supported by a ticket, I only refactor subs I need to touch for other reasons, and the refactor is always committed in a separate commit before the actual changes.

    map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re: Refactoring just to refactor?
by davido (Cardinal) on Jun 29, 2019 at 23:12 UTC

    sub index_menu { return [ map { anchor(textify($_) => {href => $_}) } sort { article_sort($a,$b) } grep { m/^[A-Z]./ && -f $_ } file_list(shift() // '.') ]; }

    Why I did it this way:

    • return [...];: Makes very obvious what is about to happen; we're returning an array ref.
    • Braces aligned both left and right: similar things should look similar. Why? Because if they suddenly look dissimilar it means I've made a mistake; missing closing brace, for example.
    • grep first operation to reduce the sort problem space. The problem space may be small today but it's just a good habit.
    • Call file_list() directly and pass its response into grep: Why store in an explicit intermediate variable if you're already chaining all the rest of the operations? This reduces the amount of state one has to be concerned with.
    • shift() // '.': This may be controversial; we're not unpacking args at the top of the sub. But unpacking is the first thing that happens, and it happens within the first statement. The advantage we get is that one doesn't have to remember yet another variable or be concerned with what it may have act upon it. The brevity reduces the amount of state one needs to keep in-head. But yes, this last one is probably controversial since it would probably violate a PBP rule.

    Also m/^[A-Z]./ has the same effect here as m/^[A-Z].+/; either way you're matching any string that starts with A-Z and contains at least one more character. But the one without the quantifier doesn't have to read forward to the end of the string unnecessarily.

    Finally, the abstraction may be wrong altogether. Here we're splitting hairs on how to format the code or what order sort and grep should come in, but overlooking the fact that anchor(textify(...), {href => ...}) looks a lot like HTML generation, and that you would probably be better served with a proper MVC framework and at very least with the use of a templating system. At minimum a template system to generate the HTML would be cleaner, and could likely be dropped into your code without other major refactoring.


    Dave

      Wow! I didn't mean for this thread to rewrite that sub, it was just an example. 8)

      • Doesn't putting return at the beginning of the subroutine make it harder to find? I always return or print at the end of subroutines.
      • I will give the right curly brace alignment some thought. I am happy with the left curly brace alignment, but I am not sure about the right at this time.
      • jwkrahn caught that I used sort before grep here, and I made the change already.
      • At one time I called subroutines without assigning them to variables. For some reason, maybe readability, I stopped doing that and always assign subroutine output to their own variables where applicable.
      • I have been writing for Perl 5.8.8 for so long now that I have not taken time to learn the goodies in later versions of Perl. I think // is in a later version, so I have not used it yet. Also, does it not lead to confusion if the parameter for the subroutine is not named? Will readers of the subroutine know where the parameter is being used?

      This is how I write subroutines.

      sub name { my $parameter = shift; # or # my ($parameter1, parameter2) = @_; my $end_variable; # ... code to munge the parameter(s) and set end variable... return $end_variable; # or # print $end_variable; }

      I do not know why I added .+ to the end of the regex. It doesn't even need the ., the ^[A-Z] would do just as well. I may have been writing other regexen where I needed to search whole strings when I wrote that one.

      Many Monks have tried to get me to use various "template" modules. None of them have made any sense to me, even when I tried to use them, they have all lead me to pits of despair. I rolled my own modules for generating HTML for my site that make sense to me. The refactoring for my site for some template module would take months to rewrite. My home rolled template and HTML generation modules are loathed my most, if not all, Monks, but they are what I am comfortable using. I have used them both heavily. I am sure you have seen others banging their heads against that wall before. I try to not bring them up. If you look and then want to yell at me about them, you can open an issue on GitHub to keep it away from here. Too much drama surrounds them already.

      Thank you for the tips!

      No matter how hysterical I get, my problems are not time sensitive. So, relax, have a cookie, and a very nice day!
      Lady Aleena

        I didn't intend to incite drama. :) My suggestions are just my own opinions, and I reserve the right to be outvoted by the prevailing trends, or vetoed by the owner of the code (you).

        I mentioned Perl Best Practices earlier. The book is great but not because it has 256 very good suggestions. No, it has 256 good morsels of food for thought. In the years since it was written some of the ideas took hold, others were shown to take us down a path toward madness, and some are hotly debated. I don't think my advice is any better. :)


        Dave

        Doesn't putting return at the beginning of the subroutine make it harder to find? I always return or print at the end of subroutines.

        davido's subroutine has only one statement, which is the return. This means that the return is both at the beginning and at the end. Whichever it is perceived to be then becomes a purely philosophical matter for the reader. :-)

        I have been writing for Perl 5.8.8 for so long now that I have not taken time to learn the goodies in later versions of Perl. I think // is in a later version, so I have not used it yet.

        You are correct and the defined-or operator // was introduced in perl 5.10. It is a very useful shorthand and I use it all the time.

        From personal experience I would suggest that you do take time to learn the features introduced in newer perls. Maybe don't bother with anything marked experimental and IMHO stay well away from smartmatch and given/when. The rest is pretty good and if your only reason not to use them is lethargy/inertia then maybe this is the prod that will do it? Of course if you need to support 5.8.8 or earlier for whatever reason then that's fine and you just have to accept you are stuck with that for now.

        It is also worth pointing out that despite its truly excellent backwards compatibility, Perl (including the core modules) does sometimes remove obsolete features or introduce breaking changes, so it pays to keep an eye on those too.

        Also, does it not lead to confusion if the parameter for the subroutine is not named? Will readers of the subroutine know where the parameter is being used?

        This is why we have comments. If you think that another reader (or even yourself in 6 months) will have any confusion, simply comment the code to say "this takes one argument (see 'shift()') which is the directory of files to list and defaults to '.' if absent" and the job is done.

Re: Refactoring just to refactor?
by swl (Parson) on Jun 28, 2019 at 01:46 UTC

    I would leave the existing code as it is. You could perhaps modify it when next you are working on that area, and then only if it can be justified on the basis of speed and/or clarity.

    And WRT speed, I have not tested, so others could comment on whether there is a meaningful difference, if any at all.

    edit immediately after posting: And one should ensure there are tests in place to ensure the code functions correctly after a change.

Re: Refactoring just to refactor?
by karlgoethebier (Abbot) on Jun 28, 2019 at 12:43 UTC

    I hope that i'm not totally Perl-Blind but a good refactoring might be to rewrite your fancy sub like this:

    sub index_menu { my $dir = shift; # my @file_list = file_list('.'); my @file_list = file_list($dir); # if this is what you meant my $files = [ map { anchor( textify($_), { href => $_ } ) } grep { /^[A-Z].+/ && -f $_ } sort { article_sort($a, $b) } @file_list ]; return $files; }

    You may also consider to use the more hard-boiled style and omit the explicit return but that's a matter of taste.

    And if i where in your shoes i would leave my old working code untouched and use the BLOCK LIST style (which is the natural way IMHO) from today on.

    Best regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

    perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

      Thank you for catching that I did not use $dir in the sub where I meant to use it.

      I am not clear on what you mean by "hard-boiled style". I always either return or print at the end of a sub. (I have not gotten to use the feature say that much.) I do not know how else to end a subroutine.

      No matter how hysterical I get, my problems are not time sensitive. So, relax, have a cookie, and a very nice day!
      Lady Aleena

        A subroutine will return to its caller the result of the last evaluated expression. I would recommend to use return() in all cases within all branches, unless you know why/when not to.

        Here's an example of output of values. Note that say() is just a glorified print(), but it adds a newline at the end by default. Nothing special here...

        use warnings; use strict; use feature 'say'; sub return_print { my $x = 254; print "$x\n"; } sub return_return { my $x = 255; return $x; } say "print: " . return_print(); say "return: " . return_return();

        Output:

        254 print: 1 return: 255

        So, the first output (254) is the output from the print() call. The second output (print: 1) is the output of a successful call to print(), which is 1. The third value is the return value explicitly sent via return().

        I would highly recommend you play around and get a grasp on what return values you're getting from and/or expecting from your function/method calls.

        All of your tests should be checking against all possible and potential return values so there aren't any unexpected results. If there are any odd things found, add more tests, or fix your subroutines to ensure they only return very specific value(s).

Re: Refactoring just to refactor?
by jwkrahn (Abbot) on Jun 28, 2019 at 19:14 UTC
    map { anchor( textify($_), { href => $_ } ) } grep { /^[A-Z].+/ && -f $_ } sort { article_sort($a, $b) }

    If possible, I would put the grep before the sort so that you probably have fewer items to sort which takes less time. (more efficient):

    map { anchor( textify($_), { href => $_ } ) } sort { article_sort($a, $b) } grep { /^[A-Z].+/ && -f $_ }

      I was thinking about putting grep before sort. With the lists being less than 20 items (more like 4 or 5), it does not make too big a difference.

      No matter how hysterical I get, my problems are not time sensitive. So, relax, have a cookie, and a very nice day!
      Lady Aleena
Re: Refactoring just to refactor?
by ForgotPasswordAgain (Priest) on Jul 01, 2019 at 15:06 UTC

    Personally I wouldn't refactor it, but that's just my opinion.

    There used to be a possible performance reason to use an expression instead of a block: blocks set up a lexical scope, so there was some overhead. That was taken care of some years ago, as far as I know. It's why I have a habit of using expressions, even though most documentation uses blocks.