Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Perl Best Practices - Loop Labels

by kcott (Archbishop)
on Apr 16, 2020 at 04:27 UTC ( [id://11115604]=perlquestion: print w/replies, xml ) Need Help??

kcott has asked for the wisdom of the Perl Monks concerning the following question:

G'day All,

I purchased and read (probably two or three times) "Perl Best Practices" when it first came out 15 years ago. While I still follow many of the suggestions and recommendations in that book, I have, over time, dropped some and modified (my usage of) others. Questions came up at $work regarding the use of loop labels: I'm seeking your thoughts on that specific area of Perl Best Practices (PBB).

For those that have the book, the main sections to which I refer are Distributed Control (pp. 126-128) and Loop Labels (pp. 129-131). I did note, when looking up the link for the book, that there is a more recent video, "Modern Perl Best Practices", which may hold answers; unfortunately, I don't own the video, so that's of no immediate help.

I often use loop labels to manage control of nested loops. My code may look something like the following.

OUTER: for (...) { next OUTER if $cond1; INNER: for (...) { ... last INNER if $cond2; ... next OUTER if $cond3; ... last OUTER if $cond4; ... next INNER if $cond5; ... } last OUTER if $cond6; }

That example is highly contrived, has poorly named labels and variables, and is generally simplified by the use of (if) statement modifiers. Production code would typically be more involved with, for example, if blocks and elisions replaced with multiple lines of code. The intention is to show potential logic complexity as opposed to complex code.

The $work questions generally centred around whether jumping out of nested loops, starting the next iteration of a loop early, and so on, was a good practice. I believe the similarity to "goto LABEL" in some languages, which generates spaghetti code which is hard to read and maintain, was possibly behind this line of questioning: I'll need to follow that up but it could take a while (especially with people working non-standard hours due to current COVID-19 issues). I am aware that the Perl goto statement can be used differently from what's encountered in other languages.

So, using labels for this type of loop control is a PBB practice I've retained. I believe it is far superior to the convoluted code that can be necessary if structured programming techniques are employed; such as the use of control flags and deeply nested if statements governed by those flags. I'm aware that many of the PBB suggestions and recommendations have been rethought over the intervening years: perhaps this in one of them.

Any thoughts you may have concerning the use of labels to control nested loops would be appreciated. You may agree (or disagree) 100% with my current usage; you might use this PBB practice with some exceptions or modifications; you may have an improved way of doing this; and so on: all feedback is welcome.

— Ken

Replies are listed 'Best First'.
Re: Perl Best Practices - Loop Labels
by tobyink (Canon) on Apr 16, 2020 at 06:51 UTC

    The $work questions generally centred around whether jumping out of nested loops, starting the next iteration of a loop early, and so on, was a good practice.

    So say you need to search through a bunch of files to see if Joe Bloggs is mentioned in any of them. You don't care which files he's mentioned in, or how many times. You just want a boolean — is he mentioned at all?

    my $mentioned = 0; FILE: for my $file ( @files ) { LINE: for my $line ( @lines ) { /Joe Bloggs/ and ++$mentioned and last FILE; } } return $mentioned;

    The question of whether it's good practice to jump out of nested loops becomes "after I've found the answer to my question, should I keep searching through the rest of the files?"

    Or another way of thinking about it: "after I found my lost car keys, should I keep looking for them?"

    I'm sure there are good times to jump out of loops and bad times to jump out of loops, and there are many subtle nuances. But in the general case, if you know a loop has served its purpose, jump out of it.

    (Oh, and another thing. You'll notice I labelled my inner loop too, even though I never used that label. I find labelling loops, especially nested loops can be a form of documentation.)

    Update:, please, please don't do this though:

    my $mentioned = 0; FILE: for my $file ( @files ) { check_file($file, \$mentioned); } return $mentioned; ...; sub check_file { my ($file, $mentioned) = @_; LINE: for my $line ( @lines ) { /Joe Bloggs/ and ++$$mentioned and last FILE; } }

    Yes, Perl does let last to be in a subroutine called from the loop. Don't do that. It's really hard to grok. Only use next, last, and redo lexically within the loop block they affect.

      G'day tobyink,

      Your usage seems to align with mine. For single loops, I don't generally supply a label. For nested loops, I'll generally give all loops a label; although, if I'm not using next, last, etc. I'll probably omit the labels, e.g. for processing all cells in a grid:

      for my $row (1 .. $row_count) { for my $col (1 .. $col_count) { # process cell at row $row and column $col } }

      Not really related to loops but years ago, when working with junior programmers who didn't fully understand lexically scoped pragmata, I'd use labels to document anonymous blocks; usually something overt, such as:

      SUBROUTINE_REDEFINITION_BLOCK: { no warnings 'redefine'; sub subname { ... } }

      This followed several instances where braces delimiting anonymous blocks had been removed because "they looked like superfluous code".

      — Ken

        (Oh, and another thing. You'll notice I labelled my inner loop too, even though I never used that label. I find labelling loops, especially nested loops can be a form of documentation.)

      Interesting -- my preference would be to not do that, because then my brain would be searching for where the LINE: label is used. If a label's not used, I wouldn't put it in. That's a matter of personal taste, I guess -- but also, this is a simplified example.

      Alex / talexb / Toronto

      Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

Re: Perl Best Practices - Loop Labels
by GrandFather (Saint) on Apr 16, 2020 at 09:06 UTC

    I agree with Marshall (Re: Perl Best Practices - Loop Labels), although I'd clarify somewhat:

    I generally avoid manifest nested loops that require jumping between levels by refactoring the code, often by putting the inner loop in a sub and using an early exit to bail. That has the advantage that it completely avoids "spaghetti" code and allows a descriptive name to be used for the sub. Identifiers, be they labels or sub names, can make understanding the intent of the code much easier without needing to introduce comments. Putting the inner loop code in a sub generally cleans up the outer loop wonderfully so the logic is easier to see. The result is code that is easier to grok and thus easier to write and maintain.

    Having said that, there are no hard rules. Either approach may be more suitable in different situations. But, like Marshall, I don't remember when I might have used a loop label. Maybe never. I have a few workmates who use them as error exits from loops (in C++ as it happens), but very rarely (a hand full of times in a few tens of millions of lines of code).

    Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
Re: Perl Best Practices - Loop Labels
by davido (Cardinal) on Apr 16, 2020 at 17:37 UTC

    Another contrived and incomplete example:

    # Validate a data set ITEM: foreach my $item (@data) { foreach my $key (qw(foo bar baz)) { if (!exists $item->{$key}) { warn "Couldn't grok item. $key missing. Skipping.\n", Dump +er($item); next ITEM; } } # Do something useful with this item. }

    A perfectly sane approach. Of course the inner loop could have been a grep, and you could warn after the grep if the number of matching keys doesn't reach the expectation:

    if (3 != grep {exists $item->{$_}} qw(foo bar baz)) { warn ....; # but your warning can't be as specific. next; }

    But either way you're dealing with nested loops, just in different forms. Sometimes the use-case doesn't lend itself well to a grep or map, and sometimes, even, there's advantage to bailing out at the earliest opportunity. And sometimes not bailing out early makes it harder to keep track of what condition led to the need to bail out at all.

    My suggestion is this: If labels make a particular section of code easier to understand, and jumping out of nested depths is the appropriate thing to do, don't let a Perl Critic policy dissuade you. Just be sure that you really have chosen the clearest code expression that can solve the problem at the appropriate level of efficiency. If you make your code more complex in an effort to avoid jumping out of a nested loop, everyone loses. If you make it more complex by jumping out, everyone loses. If there is a better construct that avoids the issue entirely, use it. If there is not, use the construct that achieves the needs, but with code clarity high on the list of needs. This will mean sometimes jumping out of a nested loop, or skipping to the next outer iteration is the right thing to do.


    Dave

Re: Perl Best Practices - Loop Labels
by haukex (Archbishop) on Apr 16, 2020 at 21:29 UTC

    Just to add something I don't see mentioned yet:

    I believe the similarity to "goto LABEL" in some languages, which generates spaghetti code which is hard to read and maintain, was possibly behind this line of questioning

    Perl's next, last, and redo are much more akin to C's continue and break than they are to goto. So unless your colleagues have issues with the former two, tell them not to worry :-) IMHO, labeled blocks, including loops, are basically the much, much better version of goto, in that they allow complicated flow control to be implemented in a much cleaner way. Also, IIRC, their behavior in regards to the stack is much cleaner than with goto. There are three cases that I can think of right now where people try to justify gotos:

    • Flow control in loops - that can be much better implemented with the aforementioned next, last, and redo.
    • Control constructs where we have much better solutions nowadays, I'm thinking of On Error Goto ... or everyone's favorite, On Error Resume Next. (Nowadays of course try/catch.)
    • Really low-level stuff, like IIRC I once goto'd into an assembly routine from C, which of course doesn't apply in Perl.

    And then there's the spaghetti code artists that are the reason that goto has such a deservedly bad reputation.

    Anyway, as for the general question, I use labeled loops much like tobyink showed: in nested loops, and with sensible names (e.g. "last LINE" and "next FILE" are great to understand). I almost never have more than two nested loops, at a max three (everything else is in subs or methods), and the other thing is that I try to keep my loops short, under a page if possible, so that one doesn't lose an overview of the control flow. When used like this, including in the example you showed, I think labels are a Good Thing.

      There is at least one very good use of goto in C: error handling.

      In a function that allocates and initializes complex structures, an earlier allocation can succeed but a later allocation fail. When this happens, the earlier allocation must be released before returning NULL to avoid leaking memory.

      something * alloc_something(void) { something * ret = malloc(sizeof(something)); if (ret == NULL) goto out; ret->another_thing = alloc_another_thing(); if (ret->another_thing == NULL) goto out_free_ret; return ret; /* error exits */ out_free_ret: free(ret); out: return NULL; }

      I learned this style from reading Linux kernel sources and it makes error handling much more readable and maintainable by keeping the successful path and the error path separate. While this example was very simple, this pattern especially shines when more than one allocation must be backed out before returning failure because it avoids duplicating the code to release the earlier allocations.

        Interesting .. yet I can see a fairly simple way to restructure this C code, without either of the goto statements ..

        something * alloc_something(void) { /* Make two malloc requests. Insure both succeed; return allocated memory, if any. Three possible logic paths: 1. First malloc fails, and we are done. 2. First malloc succeeds, second malloc fails: free the first allocated block, and we are done. 3. First and second mallocs succeed, and we are done. */ something * ret = malloc(sizeof(something)); /* Did the first request succeed? */ if (ret != NULL) { ret->another_thing = alloc_another_thing(); /* Did the second request fail? */ if (ret->another_thing == NULL) { free(ret); ret = NULL; } } return ret; }

        Alex / talexb / Toronto

        Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.

Re: Perl Best Practices - Loop Labels
by eyepopslikeamosquito (Archbishop) on Apr 17, 2020 at 07:09 UTC

    Like others mentioned in this thread, I rarely use loop labels. One real-world example I remember is a mock Syslog server I wrote a while back for automated testing. I embed the whole function below to give an example of a real-world (not contrived) example of using loop labels in Perl. I remember at the time being a bit surprised about using labels in Perl (because I do it so rarely) but after careful consideration felt it was the clearest way to write this particular code. Also the function itself was quite a bit longer than I usually write, but again felt it was warranted here.

    sub do_syslog_server { my $host = shift; my $port = shift; my $sleep_after_accept = shift; my $sleep_after_recv = shift; my_log( "Start on host '$host' at " . get_datetime_stamp() . "\n" ) +; my_log(" pid=$$\n"); my_log(" port=$port\n"); my_log(" sleep_after_accept=$sleep_after_accept\n"); my_log(" sleep_after_recv=$sleep_after_recv\n"); # This socket is used to listen for connections. my $listener = IO::Socket::INET->new( LocalPort => $port, Proto => 'tcp', Listen => 5, ReuseAddr => 1, ) or die "error: IO::Socket::INET new: $@"; my $selector = IO::Select->new($listener); SERVER: while ( my @ready = $selector->can_read() ) { CLIENT: for my $client (@ready) { if ( $client == $listener ) { my $new_conn = $listener->accept(); $selector->add($new_conn); my $fh_hex = sprintf '0x%x', $new_conn; my $peerhost = $new_conn->peerhost(); my $peerport = $new_conn->peerport(); my $peeraddr = $new_conn->peeraddr(); my $peerhostfull = gethostbyaddr( $peeraddr, AF_INET ) || +"Cannot resolve"; my $fromstr = "from $peerhost:$peerport (host=$peerhostful +l)"; my_log("Accepted new connection $fromstr\n"); if ($sleep_after_accept) { my_log("Sleeping for $sleep_after_accept seconds...\n") +; sleep($sleep_after_accept); } } else { my $cli_cmd_str = recv_tcp_client($client); if ( !defined($cli_cmd_str) ) { my $peerhost = $client->peerhost(); my $peerport = $client->peerport(); my_log("Client $peerhost:$peerport closed socket\n"); $selector->remove($client); $client->close(); next CLIENT; } if ( $cli_cmd_str =~ /^KNOB_SERVER_PLEASE_QUIT\s*$/ ) { my_log("Server quitting on Knob's command\n"); last SERVER; } if ($sleep_after_recv) { my_log("Sleeping for $sleep_after_recv seconds...\n"); sleep($sleep_after_recv); } } } } my_log("Closing server\n"); close($listener) or die "error: close server: $!"; my_log("End do_syslog_server\n"); }

    Update: see also Re: Multiple consecutive connections to a socket - example event-driven server using IO::Select

Re: Perl Best Practices - Loop Labels
by 1nickt (Canon) on Apr 16, 2020 at 11:27 UTC

    Hi Ken,

    Your usage of loop labels seems completely correct to me. I use them for exactly the same control, usually to jump out early of an inner or maybe the outer loop. It's a standard technique to keep track of where you are and move around.

    Having said that, I try to avoid deeply nested loops -- i.e. no more than two levels, and as swl said, if you find yourself needing more than one level of nesting, it's likely time to refactor and make some subroutines.

    I also agree that people who don't understand Perl's loop control labels and use of goto to dispatch to another method, are often laboring under a misconception or two when they give their opinion on the matter.

    Hope this helps!


    The way forward always starts with a minimal test.
Re: Perl Best Practices - Loop Labels
by kcott (Archbishop) on Apr 17, 2020 at 06:01 UTC

    Firstly, a huge thankyou to everyone who replied. I am somewhat overwhelmed by the volume of responses. It is very much appreciated.

    After posting the OP yesterday, I replied to the first responses and then logged out. Logging back in today, I was presented with about a dozen direct replies; many of those had spawned their own little sub-threads. Much as I might like to reply to everyone, it really isn't practical; so, please take this as a general response to all. Another reason for a general reply is that I think I'd probably be repeating myself in many individual replies; this would tend to bloat the thread.

    The extent to which people use labels varies quite a lot. Some use them always, which is in line with PBP; others use them to a greater of lesser degree depending on context, which aligns more with my usage; and, some either generally don't like them or have never found a need to use them. I expect, in general, I'll probably continue with my current usage; although, some replies were thought provoking and that may cause me to modify usage in certain situations.

    Refactoring was mentioned in quite a few places and I agree with this. I inwardly groan whenever I encounter programs with monolithic tracts of code; these are hard to read, comprehend, maintain, extend and debug. I generally tend to have more _helper()-type functions than direct_interface()-type functions.

    Another point raised was the depth of nesting and, again, I concur. Wherever possible, I generally try to avoid nesting at all; there are occasions when that's unavoidable, in which case I aim for shallower rather than deeper; if I've reached a fourth level, I've probably done something wrong and will rethink the solution.

    A number of people spoke about PBP not being a set of rules that should be slavishly followed but rather a series of suggestions and recommendations to be adapted as appropriate. Definitely no argument from me on that one.

    In conclusion, I asked for your thoughts and have received them in profusion. I won't be making any drastic changes to the way I work but I may modify some behaviours in a small way. Again, thank you very much.

    — Ken

Re: Perl Best Practices - Loop Labels
by BillKSmith (Monsignor) on Apr 16, 2020 at 12:44 UTC
    Do not forget "Perl Best Practices" Chapter 1. What is really important is that you choose a style that works for you (or your organization) and use it consistently. Consider the advice given in the book, in this forum, or anywhere else, make your own decision, and stick to it. I do admit that even this advice is "do as I say, not as I do". I try to use unnecessary loop labels as documentation, but often forget, or just do not like the added clutter.
    Bill
Re: Perl Best Practices - Loop Labels
by haj (Vicar) on Apr 16, 2020 at 22:24 UTC

    Short version: I'm strictly following PBP on this one, and I agree with your current usage.

    I'm following my own shortcut rule to this: Always use labels after next, redo, and last.. That includes: Use labels even if the loop isn't nested. Maybe it will become nested when some other guy adds some feature years later, and for that case the label adds robustness. When refactoring complex loops it helps to define the scope: Extracting an inner loop into a subroutine needs extra care if the inner loop contains a last OUTER;.

    It has already been noted by others that good names for labels also make good documentation - next FILE; or last TRY; tell pretty well what the line is trying to achieve, and I don't need to scroll even if that line happens to be at the top of my screen.

    I don't see these loop controls related to goto, but rather to two other mechanisms of execution control: return and die. A return is implicitly labeled with the surrounding sub (I know, special cases exist where it isn't), and die is followed by a description why you bail out. With a label, loop control keywords achieve the same level of self-explanation. Loop control, returning from a subroutine and exceptions are part of every modern programming language, and they have in common that they go strictly "upward" in the call stack or loop hierarchy. A goto LABEL; or even, horrors, goto EXPRESSION is indeed scary, not only for maintenance of the code, but also for those who write compilers and interpreters.

Re: Perl Best Practices - Loop Labels
by swl (Parson) on Apr 16, 2020 at 08:07 UTC
      Mentioned in passing more like, not covered :)
Re: Perl Best Practices - Loop Labels
by roho (Bishop) on Apr 16, 2020 at 14:52 UTC
    I also use loop labels for control flow in nested loops where appropriate. When I coded in COBOL in the 1970's using structured programming, I made use of the GO TO statement to transfer control to a label at the end of the subroutine to avoid unnecessarily complex nested IF statements. I remember the agony of trying to debug code with IF statements nested to an ungodly number of levels, when a few well placed GO TO's to the end of the subroutine would have made life much easier. There is definitely a place for loop labels.

    "It's not how hard you work, it's how much you get done."

Re: Perl Best Practices - Loop Labels
by Marshall (Canon) on Apr 16, 2020 at 07:52 UTC
    I am surprised at your use of loop labels. In my experience this is a very rare thing. I have used a Perl loop label maybe once in the past few years. Unfortunately, I haven't found that example yet in my code base - perhaps my grep kung fu is failing?

    The normal way (in my opinion) to exit completely from an inner and outer loop in C or Perl is to use a return statement. You put the loops in a subroutine and use an embedded return statement. Yes, there are some folks who advocate for adding a conditional flag like: while(...and !$end_flag){}, where inside the loop the code sets $end_flag=1 to end the loop. The theory behind that is that the code should only one way in and only one way out. However, I believe that if the code is short (<1/3-1/2 page), having an intermediate "return" is no big deal. This is often an ERROR return and will have some sort of #ERROR comment.

    sub XXX { #sub setup params... for (...) { next if $cond1; for (...) { ... last if $cond2; # next OUTER if $cond2 ... last if $cond3; # next OUTER if $cond3; # WHAT? # INNER vars do not remain the same ... return() if $cond4; # same as last OUTER ... next if $cond5; # redundant all cndx are next ... } return if $condx6; #early return } return }
    In your pseudo code, there appears to be some assumption that going back to the outer loop somehow maintains the inner loop vars. I said "WHAT?". Perhaps you have a relatively short example that you could post and the Monks could have a go at it? I didn't understand completely the intent of your pseudo code.

    In general, the loop conditional should express the conditions upon which the loop normally terminates.

      G'day Marshall,

      "The normal way (in my opinion) to exit completely from an inner and outer loop in C or Perl is to use a return statement."

      Sorry, but that's completely wrong. The return function is for exiting a subroutine, not a loop. The first sentence of the last documentation starts (my emphasis):

      "The last command is like the break statement in C ..."

      Consider this code which uses last:

      $ perl -E 'X(); sub X { for (0..2) { last if $_ > 1; say; } say 42; }' 0 1 42

      Now this code, which is identical in all respects, except last has been replaced by return:

      $ perl -E 'X(); sub X { for (0..2) { return if $_ > 1; say; } say 42; +}' 0 1

      Note how the say 42; is not executed in that second example.

      "Yes, there are some folks who advocate for adding a conditional flag like: while(...and !$end_flag){}, where inside the loop the code sets $end_flag=1 to end the loop. The theory behind that is that the code should only one way in and only one way out."

      I addressed "structured programming techniques" in my OP.

      "In your pseudo code, there appears to be some assumption that going back to the outer loop somehow maintains the inner loop vars."

      No, I have made no such assumption.

      Although I do have some other issues with what you've written, I'll leave it there for now.

      — Ken

        Your wrote: "Sorry, but that's completely wrong. The return function is for exiting a subroutine, not a loop. The first sentence of the last documentation starts (my emphasis):

        If you look at my code, I put both loops within a subroutine, XXX. If the inner loop needs to abort the outer loop, a return statement is appropriate. Of course you have to refactor the code into a subroutine so that return from the inner loop aborts the outer loop.

        If you don't do that then, you get to this stuff where inner loop has to set a flag that causes the outer loop to finish. Put both loops in a sub and just return from the inner loop when no more processing is necessary.

        As far as return() goes, this would be more like I describe based upon your code:

        use strict; use warnings; x($_)for (0..5); #sub x won't print any num >1 print "42\n";; sub x { my $num = shift; return if $num >1; print "$num\n"; } __END__ Prints: 0 1 42
        This can be expanded to deal with 2 or more dimensions.
Re: Perl Best Practices - Loop Labels
by Anonymous Monk on Apr 16, 2020 at 18:36 UTC

    FWIW (since I no longer write Perl for a living) I consider loop labels to be the exception rather than the rule. This means I use labels only if they are necessary. My take on your code example would be:

    OUTER: for (...) { next if $cond1; for (...) { ... last if $cond2; ... next OUTER if $cond3; ... last OUTER if $cond4; ... next if $cond5; ... } last if $cond6; }

    I find that this makes it stand out when something unusual is going on, in a way that labeling everything obscures.

    One counter-argument to my position is that labeling everything and using the labels everywhere makes it explicit where all the control transfers go.

    I believe the arguments against ever using labels have already been adequately covered (deeply-nested control structures, flag variables that must be carried around and tested, etc, etc ...)

    The bottom line is that there is no magic bullet, despite the touching faith of certain people in the $work environment to the contrary.

A reply falls below the community's threshold of quality. You may see it by logging in.
A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11115604]
Approved by thomas895
Front-paged by thomas895
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (3)
As of 2024-04-20 12:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found