Perl Best Practices - Loop Labels

kcott has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Perl Best Practices - Loop Labels by tobyink (Canon) on Apr 16, 2020 at 06:51 UTC
The $work questions generally centred around whether jumping out of nested loops, starting the next iteration of a loop early, and so on, was a good practice. So say you need to search through a bunch of files to see if Joe Bloggs is mentioned in any of them. You don't care which files he's mentioned in, or how many times. You just want a boolean — is he mentioned at all? `my $mentioned = 0; FILE: for my $file ( @files ) { LINE: for my $line ( @lines ) { /Joe Bloggs/ and ++$mentioned and last FILE; } } return $mentioned;` [download] The question of whether it's good practice to jump out of nested loops becomes "after I've found the answer to my question, should I keep searching through the rest of the files?" Or another way of thinking about it: "after I found my lost car keys, should I keep looking for them?" I'm sure there are good times to jump out of loops and bad times to jump out of loops, and there are many subtle nuances. But in the general case, if you know a loop has served its purpose, jump out of it. (Oh, and another thing. You'll notice I labelled my inner loop too, even though I never used that label. I find labelling loops, especially nested loops can be a form of documentation.) Update:, please, please don't do this though: `my $mentioned = 0; FILE: for my $file ( @files ) { check_file($file, \$mentioned); } return $mentioned; ...; sub check_file { my ($file, $mentioned) = @_; LINE: for my $line ( @lines ) { /Joe Bloggs/ and ++$$mentioned and last FILE; } }` [download] Yes, Perl does let `last` to be in a subroutine called from the loop. Don't do that. It's really hard to grok. Only use `next`, `last`, and `redo` lexically within the loop block they affect. toby döt ink	[reply] [d/l] [select]
Re^2: Perl Best Practices - Loop Labels by kcott (Archbishop) on Apr 16, 2020 at 07:57 UTC
G'day tobyink, Your usage seems to align with mine. For single loops, I don't generally supply a label. For nested loops, I'll generally give all loops a label; although, if I'm not using `next`, `last`, etc. I'll probably omit the labels, e.g. for processing all cells in a grid: `for my $row (1 .. $row_count) { for my $col (1 .. $col_count) { # process cell at row $row and column $col } }` [download] Not really related to loops but years ago, when working with junior programmers who didn't fully understand lexically scoped pragmata, I'd use labels to document anonymous blocks; usually something overt, such as: `SUBROUTINE_REDEFINITION_BLOCK: { no warnings 'redefine'; sub subname { ... } }` [download] This followed several instances where braces delimiting anonymous blocks had been removed because "they looked like superfluous code". — Ken	[reply] [d/l] [select]
Re^2: Perl Best Practices - Loop Labels by talexb (Chancellor) on Apr 16, 2020 at 13:20 UTC
(Oh, and another thing. You'll notice I labelled my inner loop too, even though I never used that label. I find labelling loops, especially nested loops can be a form of documentation.) Interesting -- my preference would be to not do that, because then my brain would be searching for where the `LINE:` label is used. If a label's not used, I wouldn't put it in. That's a matter of personal taste, I guess -- but also, this is a simplified example. Alex / talexb / Toronto Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.	[reply] [d/l]
Re: Perl Best Practices - Loop Labels by GrandFather (Saint) on Apr 16, 2020 at 09:06 UTC
I agree with Marshall (Re: Perl Best Practices - Loop Labels), although I'd clarify somewhat: I generally avoid manifest nested loops that require jumping between levels by refactoring the code, often by putting the inner loop in a sub and using an early exit to bail. That has the advantage that it completely avoids "spaghetti" code and allows a descriptive name to be used for the sub. Identifiers, be they labels or sub names, can make understanding the intent of the code much easier without needing to introduce comments. Putting the inner loop code in a sub generally cleans up the outer loop wonderfully so the logic is easier to see. The result is code that is easier to grok and thus easier to write and maintain. Having said that, there are no hard rules. Either approach may be more suitable in different situations. But, like Marshall, I don't remember when I might have used a loop label. Maybe never. I have a few workmates who use them as error exits from loops (in C++ as it happens), but very rarely (a hand full of times in a few tens of millions of lines of code). Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond	[reply]
Re: Perl Best Practices - Loop Labels by davido (Cardinal) on Apr 16, 2020 at 17:37 UTC
Another contrived and incomplete example: `# Validate a data set ITEM: foreach my $item (@data) { foreach my $key (qw(foo bar baz)) { if (!exists $item->{$key}) { warn "Couldn't grok item. $key missing. Skipping.\n", Dump +er($item); next ITEM; } } # Do something useful with this item. }` [download] A perfectly sane approach. Of course the inner loop could have been a grep, and you could warn after the grep if the number of matching keys doesn't reach the expectation: `if (3 != grep {exists $item->{$_}} qw(foo bar baz)) { warn ....; # but your warning can't be as specific. next; }` [download] But either way you're dealing with nested loops, just in different forms. Sometimes the use-case doesn't lend itself well to a grep or map, and sometimes, even, there's advantage to bailing out at the earliest opportunity. And sometimes not bailing out early makes it harder to keep track of what condition led to the need to bail out at all. My suggestion is this: If labels make a particular section of code easier to understand, and jumping out of nested depths is the appropriate thing to do, don't let a Perl Critic policy dissuade you. Just be sure that you really have chosen the clearest code expression that can solve the problem at the appropriate level of efficiency. If you make your code more complex in an effort to avoid jumping out of a nested loop, everyone loses. If you make it more complex by jumping out, everyone loses. If there is a better construct that avoids the issue entirely, use it. If there is not, use the construct that achieves the needs, but with code clarity high on the list of needs. This will mean sometimes jumping out of a nested loop, or skipping to the next outer iteration is the right thing to do. Dave	[reply] [d/l] [select]
Re: Perl Best Practices - Loop Labels by haukex (Archbishop) on Apr 16, 2020 at 21:29 UTC
Just to add something I don't see mentioned yet: I believe the similarity to "goto LABEL" in some languages, which generates spaghetti code which is hard to read and maintain, was possibly behind this line of questioning Perl's next, last, and redo are much more akin to C's `continue` and `break` than they are to `goto`. So unless your colleagues have issues with the former two, tell them not to worry `:-)` IMHO, labeled blocks, including loops, are basically the much, much better version of `goto`, in that they allow complicated flow control to be implemented in a much cleaner way. Also, IIRC, their behavior in regards to the stack is much cleaner than with `goto`. There are three cases that I can think of right now where people try to justify `goto`s: Flow control in loops - that can be much better implemented with the aforementioned next, last, and redo. Control constructs where we have much better solutions nowadays, I'm thinking of `On Error Goto ...` or everyone's favorite, `On Error Resume Next`. (Nowadays of course `try/catch`.) Really low-level stuff, like IIRC I once `goto`'d into an assembly routine from C, which of course doesn't apply in Perl. And then there's the spaghetti code artists that are the reason that `goto` has such a deservedly bad reputation. Anyway, as for the general question, I use labeled loops much like tobyink showed: in nested loops, and with sensible names (e.g. "`last LINE`" and "`next FILE`" are great to understand). I almost never have more than two nested loops, at a max three (everything else is in `sub`s or methods), and the other thing is that I try to keep my loops short, under a page if possible, so that one doesn't lose an overview of the control flow. When used like this, including in the example you showed, I think labels are a Good Thing.	[reply] [d/l] [select]
Re^2: Perl Best Practices - Loop Labels by jcb (Parson) on Apr 17, 2020 at 03:20 UTC
There is at least one very good use of `goto` in C: error handling. In a function that allocates and initializes complex structures, an earlier allocation can succeed but a later allocation fail. When this happens, the earlier allocation must be released before returning `NULL` to avoid leaking memory. `something * alloc_something(void) { something * ret = malloc(sizeof(something)); if (ret == NULL) goto out; ret->another_thing = alloc_another_thing(); if (ret->another_thing == NULL) goto out_free_ret; return ret; /* error exits */ out_free_ret: free(ret); out: return NULL; }` [download] I learned this style from reading Linux kernel sources and it makes error handling much more readable and maintainable by keeping the successful path and the error path separate. While this example was very simple, this pattern especially shines when more than one allocation must be backed out before returning failure because it avoids duplicating the code to release the earlier allocations.	[reply] [d/l] [select]
Re^3: Perl Best Practices - Loop Labels by talexb (Chancellor) on Apr 19, 2020 at 19:41 UTC
Interesting .. yet I can see a fairly simple way to restructure this C code, without either of the `goto` statements .. something * alloc_something(void) { /* Make two malloc requests. Insure both succeed; return allocated memory, if any. Three possible logic paths: 1. First malloc fails, and we are done. 2. First malloc succeeds, second malloc fails: free the first allocated block, and we are done. 3. First and second mallocs succeed, and we are done. / something ret = malloc(sizeof(something)); /* Did the first request succeed? / if (ret != NULL) { ret->another_thing = alloc_another_thing(); / Did the second request fail? */ if (ret->another_thing == NULL) { free(ret); ret = NULL; } } return ret; } [download] Alex / talexb / Toronto Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.	[reply] [d/l] [select]
Re^4: Perl Best Practices - Loop Labels by jcb (Parson) on Apr 20, 2020 at 03:34 UTC
Re: Perl Best Practices - Loop Labels by eyepopslikeamosquito (Archbishop) on Apr 17, 2020 at 07:09 UTC
Like others mentioned in this thread, I rarely use loop labels. One real-world example I remember is a mock Syslog server I wrote a while back for automated testing. I embed the whole function below to give an example of a real-world (not contrived) example of using loop labels in Perl. I remember at the time being a bit surprised about using labels in Perl (because I do it so rarely) but after careful consideration felt it was the clearest way to write this particular code. Also the function itself was quite a bit longer than I usually write, but again felt it was warranted here. sub do_syslog_server { my $host = shift; my $port = shift; my $sleep_after_accept = shift; my $sleep_after_recv = shift; my_log( "Start on host '$host' at " . get_datetime_stamp() . "\n" ) +; my_log(" pid=$$\n"); my_log(" port=$port\n"); my_log(" sleep_after_accept=$sleep_after_accept\n"); my_log(" sleep_after_recv=$sleep_after_recv\n"); # This socket is used to listen for connections. my $listener = IO::Socket::INET->new( LocalPort => $port, Proto => 'tcp', Listen => 5, ReuseAddr => 1, ) or die "error: IO::Socket::INET new: $@"; my $selector = IO::Select->new($listener); SERVER: while ( my @ready = $selector->can_read() ) { CLIENT: for my $client (@ready) { if ( $client == $listener ) { my $new_conn = $listener->accept(); $selector->add($new_conn); my $fh_hex = sprintf '0x%x', $new_conn; my $peerhost = $new_conn->peerhost(); my $peerport = $new_conn->peerport(); my $peeraddr = $new_conn->peeraddr(); my $peerhostfull = gethostbyaddr( $peeraddr, AF_INET ) \|\| +"Cannot resolve"; my $fromstr = "from $peerhost:$peerport (host=$peerhostful +l)"; my_log("Accepted new connection $fromstr\n"); if ($sleep_after_accept) { my_log("Sleeping for $sleep_after_accept seconds...\n") +; sleep($sleep_after_accept); } } else { my $cli_cmd_str = recv_tcp_client($client); if ( !defined($cli_cmd_str) ) { my $peerhost = $client->peerhost(); my $peerport = $client->peerport(); my_log("Client $peerhost:$peerport closed socket\n"); $selector->remove($client); $client->close(); next CLIENT; } if ( $cli_cmd_str =~ /^KNOB_SERVER_PLEASE_QUIT\s*$/ ) { my_log("Server quitting on Knob's command\n"); last SERVER; } if ($sleep_after_recv) { my_log("Sleeping for $sleep_after_recv seconds...\n"); sleep($sleep_after_recv); } } } } my_log("Closing server\n"); close($listener) or die "error: close server: $!"; my_log("End do_syslog_server\n"); } [download] Update: see also Re: Multiple consecutive connections to a socket - example event-driven server using IO::Select	[reply] [d/l]
Re: Perl Best Practices - Loop Labels by 1nickt (Canon) on Apr 16, 2020 at 11:27 UTC
Hi Ken, Your usage of loop labels seems completely correct to me. I use them for exactly the same control, usually to jump out early of an inner or maybe the outer loop. It's a standard technique to keep track of where you are and move around. Having said that, I try to avoid deeply nested loops -- i.e. no more than two levels, and as swl said, if you find yourself needing more than one level of nesting, it's likely time to refactor and make some subroutines. I also agree that people who don't understand Perl's loop control labels and use of `goto` to dispatch to another method, are often laboring under a misconception or two when they give their opinion on the matter. Hope this helps! The way forward always starts with a minimal test.	[reply] [d/l]
Re: Perl Best Practices - Loop Labels by kcott (Archbishop) on Apr 17, 2020 at 06:01 UTC
Firstly, a huge thankyou to everyone who replied. I am somewhat overwhelmed by the volume of responses. It is very much appreciated. After posting the OP yesterday, I replied to the first responses and then logged out. Logging back in today, I was presented with about a dozen direct replies; many of those had spawned their own little sub-threads. Much as I might like to reply to everyone, it really isn't practical; so, please take this as a general response to all. Another reason for a general reply is that I think I'd probably be repeating myself in many individual replies; this would tend to bloat the thread. The extent to which people use labels varies quite a lot. Some use them always, which is in line with PBP; others use them to a greater of lesser degree depending on context, which aligns more with my usage; and, some either generally don't like them or have never found a need to use them. I expect, in general, I'll probably continue with my current usage; although, some replies were thought provoking and that may cause me to modify usage in certain situations. Refactoring was mentioned in quite a few places and I agree with this. I inwardly groan whenever I encounter programs with monolithic tracts of code; these are hard to read, comprehend, maintain, extend and debug. I generally tend to have more `_helper()`-type functions than `direct_interface()`-type functions. Another point raised was the depth of nesting and, again, I concur. Wherever possible, I generally try to avoid nesting at all; there are occasions when that's unavoidable, in which case I aim for shallower rather than deeper; if I've reached a fourth level, I've probably done something wrong and will rethink the solution. A number of people spoke about PBP not being a set of rules that should be slavishly followed but rather a series of suggestions and recommendations to be adapted as appropriate. Definitely no argument from me on that one. In conclusion, I asked for your thoughts and have received them in profusion. I won't be making any drastic changes to the way I work but I may modify some behaviours in a small way. Again, thank you very much. — Ken	[reply] [d/l] [select]
Re: Perl Best Practices - Loop Labels by BillKSmith (Monsignor) on Apr 16, 2020 at 12:44 UTC
Do not forget "Perl Best Practices" Chapter 1. What is really important is that you choose a style that works for you (or your organization) and use it consistently. Consider the advice given in the book, in this forum, or anywhere else, make your own decision, and stick to it. I do admit that even this advice is "do as I say, not as I do". I try to use unnecessary loop labels as documentation, but often forget, or just do not like the added clutter. Bill	[reply]
Re: Perl Best Practices - Loop Labels by haj (Vicar) on Apr 16, 2020 at 22:24 UTC
Short version: I'm strictly following PBP on this one, and I agree with your current usage. I'm following my own shortcut rule to this: Always use labels after `next`, `redo`, and `last`.. That includes: Use labels even if the loop isn't nested. Maybe it will become nested when some other guy adds some feature years later, and for that case the label adds robustness. When refactoring complex loops it helps to define the scope: Extracting an inner loop into a subroutine needs extra care if the inner loop contains a `last OUTER;`. It has already been noted by others that good names for labels also make good documentation - `next FILE;` or `last TRY;` tell pretty well what the line is trying to achieve, and I don't need to scroll even if that line happens to be at the top of my screen. I don't see these loop controls related to `goto`, but rather to two other mechanisms of execution control: `return` and `die`. A `return` is implicitly labeled with the surrounding `sub` (I know, special cases exist where it isn't), and `die` is followed by a description why you bail out. With a label, loop control keywords achieve the same level of self-explanation. Loop control, returning from a subroutine and exceptions are part of every modern programming language, and they have in common that they go strictly "upward" in the call stack or loop hierarchy. A `goto LABEL;` or even, horrors, `goto EXPRESSION` is indeed scary, not only for maintenance of the code, but also for those who write compilers and interpreters.	[reply]
Re: Perl Best Practices - Loop Labels by swl (Parson) on Apr 16, 2020 at 08:07 UTC
I find loop labels quite useful for comprehension, particularly for next/last calls if the loop gets long or nested (although both indicate some other refactoring is probably needed). The example provided by tobyink in 11115607 sums it up nicely. The laufeyjarson blog series on PBP might also be of general interest. Loop labels are covered in http://blog.laufeyjarson.com/2015/02/pbp-084-loop-labels/.	[reply]
Re^2: Perl Best Practices - Loop Labels by Anonymous Monk on Apr 16, 2020 at 10:03 UTC
Mentioned in passing more like, not covered :)	[reply]
Re: Perl Best Practices - Loop Labels by roho (Bishop) on Apr 16, 2020 at 14:52 UTC
I also use loop labels for control flow in nested loops where appropriate. When I coded in COBOL in the 1970's using structured programming, I made use of the GO TO statement to transfer control to a label at the end of the subroutine to avoid unnecessarily complex nested IF statements. I remember the agony of trying to debug code with IF statements nested to an ungodly number of levels, when a few well placed GO TO's to the end of the subroutine would have made life much easier. There is definitely a place for loop labels. "It's not how hard you work, it's how much you get done."	[reply]
Re: Perl Best Practices - Loop Labels by Marshall (Canon) on Apr 16, 2020 at 07:52 UTC
I am surprised at your use of loop labels. In my experience this is a very rare thing. I have used a Perl loop label maybe once in the past few years. Unfortunately, I haven't found that example yet in my code base - perhaps my grep kung fu is failing? The normal way (in my opinion) to exit completely from an inner and outer loop in C or Perl is to use a return statement. You put the loops in a subroutine and use an embedded return statement. Yes, there are some folks who advocate for adding a conditional flag like: while(...and !$end_flag){}, where inside the loop the code sets $end_flag=1 to end the loop. The theory behind that is that the code should only one way in and only one way out. However, I believe that if the code is short (<1/3-1/2 page), having an intermediate "return" is no big deal. This is often an ERROR return and will have some sort of #ERROR comment. `sub XXX { #sub setup params... for (...) { next if $cond1; for (...) { ... last if $cond2; # next OUTER if $cond2 ... last if $cond3; # next OUTER if $cond3; # WHAT? # INNER vars do not remain the same ... return() if $cond4; # same as last OUTER ... next if $cond5; # redundant all cndx are next ... } return if $condx6; #early return } return }` [download] In your pseudo code, there appears to be some assumption that going back to the outer loop somehow maintains the inner loop vars. I said "WHAT?". Perhaps you have a relatively short example that you could post and the Monks could have a go at it? I didn't understand completely the intent of your pseudo code. In general, the loop conditional should express the conditions upon which the loop normally terminates.	[reply] [d/l]
Re^2: Perl Best Practices - Loop Labels by kcott (Archbishop) on Apr 16, 2020 at 08:47 UTC
G'day Marshall, "The normal way (in my opinion) to exit completely from an inner and outer loop in C or Perl is to use a return statement." Sorry, but that's completely wrong. The return function is for exiting a subroutine, not a loop. The first sentence of the last documentation starts (my emphasis): "The last command is like the `break`* statement in C ..."* Consider this code which uses `last`: `$ perl -E 'X(); sub X { for (0..2) { last if $_ > 1; say; } say 42; }' 0 1 42` [download] Now this code, which is identical in all respects, except `last` has been replaced by `return`: `$ perl -E 'X(); sub X { for (0..2) { return if $_ > 1; say; } say 42; +}' 0 1` [download] Note how the `say 42;` is not executed in that second example. "Yes, there are some folks who advocate for adding a conditional flag like: while(...and !$end_flag){}, where inside the loop the code sets $end_flag=1 to end the loop. The theory behind that is that the code should only one way in and only one way out." I addressed "structured programming techniques" in my OP. "In your pseudo code, there appears to be some assumption that going back to the outer loop somehow maintains the inner loop vars." No, I have made no such assumption. Although I do have some other issues with what you've written, I'll leave it there for now. — Ken	[reply] [d/l] [select]
Re^3: Perl Best Practices - Loop Labels by Marshall (Canon) on Apr 18, 2020 at 04:52 UTC
Your wrote: "Sorry, but that's completely wrong. The return function is for exiting a subroutine, not a loop. The first sentence of the last documentation starts (my emphasis): If you look at my code, I put both loops within a subroutine, XXX. If the inner loop needs to abort the outer loop, a return statement is appropriate. Of course you have to refactor the code into a subroutine so that return from the inner loop aborts the outer loop. If you don't do that then, you get to this stuff where inner loop has to set a flag that causes the outer loop to finish. Put both loops in a sub and just return from the inner loop when no more processing is necessary. As far as return() goes, this would be more like I describe based upon your code: `use strict; use warnings; x($_)for (0..5); #sub x won't print any num >1 print "42\n";; sub x { my $num = shift; return if $num >1; print "$num\n"; } __END__ Prints: 0 1 42` [download] This can be expanded to deal with 2 or more dimensions.	[reply] [d/l]
Re: Perl Best Practices - Loop Labels by Anonymous Monk on Apr 16, 2020 at 18:36 UTC
FWIW (since I no longer write Perl for a living) I consider loop labels to be the exception rather than the rule. This means I use labels only if they are necessary. My take on your code example would be: `OUTER: for (...) { next if $cond1; for (...) { ... last if $cond2; ... next OUTER if $cond3; ... last OUTER if $cond4; ... next if $cond5; ... } last if $cond6; }` [download] I find that this makes it stand out when something unusual is going on, in a way that labeling everything obscures. One counter-argument to my position is that labeling everything and using the labels everywhere makes it explicit where all the control transfers go. I believe the arguments against ever using labels have already been adequately covered (deeply-nested control structures, flag variables that must be carried around and tested, etc, etc ...) The bottom line is that there is no magic bullet, despite the touching faith of certain people in the `$work` environment to the contrary.	[reply] [d/l] [select]
A reply falls below the community's threshold of quality. You may see it by logging in.
A reply falls below the community's threshold of quality. You may see it by logging in.


Perl Monk, Perl Meditation
	PerlMonks