Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re^8: Summing numbers in a file

by jcb (Parson)
on Jun 03, 2020 at 02:04 UTC ( [id://11117621]=note: print w/replies, xml ) Need Help??


in reply to Re^7: Summing numbers in a file
in thread Summing numbers in a file

a global filehandle is not necessary - a lexical declared in the module will do exactly the same job

My original argument was (intended to be) that global file handles and file-scope lexical file handles are equivalent, and the choice between them is a neutral matter of style.

this seems to be an argument for lexical filehandles rather than global ones

This comes back to a matter of style — for some programmers it may be such an argument, for others ALLCAPS tokens may be in a different "five, plus or minus two" from regular variables.

I think the usage of globals can be avoided 99.9% of the time

I agree that the use of globals generally should be minimized, but I consider file-scope lexicals (especially in the main script) a "sneaky" form of global variable. My concern is opposing unthinking "never use globals!" policies that then result in using file-scope lexicals in exactly the same way as globals — except that lexicals are harder to inspect.

On a side note, a file-scope lexical can possibly have wider scope than a global if the same file defines multiple packages. Each global is contained in its package, but the lexical will persist across package statements to the end of the file.

I disagree with it being harder to debug; issues arising from incorrectly used globals are IMHO much more annoying to debug

File-scope lexicals effectively are globals, except that they are harder to inspect in the debugger. The "even harder to debug" remark was in reference to a "never use globals!" approach where the callback is wrapped in a closure that also carries references to lexicals in the block that will use it. The effect is global variables that are not actually in the symbol table, and (as far as I know) are almost impossible to debug because there is no way to look inside a closure.

Replies are listed 'Best First'.
Re^9: Summing numbers in a file
by haukex (Archbishop) on Jun 03, 2020 at 09:58 UTC
    My original argument was (intended to be) that global file handles and file-scope lexical file handles are equivalent, and the choice between them is a neutral matter of style.

    But they're not equivalent, I've already named two disadvantages (no typo protection and potential clashes with other globals like package names), and I have yet to hear an advantage to bareword filehandles.

    Even in the limited case that you name (file-scope lexical file handles), there is yet another difference: if there ever is a clash in names, with lexicals it's incredibly easy to limit the scope of the issue: simply place the statements in a block, and you've limited the scope of the lexical, including a visual scope that allows one to glance at the code and know with certainty that this filehandle is contained within that scope. If one were using bareword filehandles instead, their scope is the entire package, beyond the bounaries of any blocks, so either you'd have to go through the entire package, renaming the filehandles to eliminate any name clashes, or you'd have to use bare blocks and the local *FH "hack", which has its own disadvantages. I see this as yet another disadvantage to bareword filehandles, again with no advantage.

    ... a file-scope lexical can possibly have wider scope than a global if the same file defines multiple packages ...

    That's true, but again easily solved by placing the package in a block, and Perl 5.14 introduced the package NAMESPACE BLOCK syntax to make this look even nicer. Some people even argue against multiple packages in one file. And even with the (admittedly sometimes confusing) issue of lexicals potentially crossing package boundaries, the scope of the lexical is still "visually" limited to the file; I would argue that name clashes because a global of the same name was used in a different file is a much more tricky issue.

    My concern is opposing unthinking "never use globals!" policies that then result in using file-scope lexicals in exactly the same way as globals ...

    Yes, a valid concern, but like you, I would argue that file-scope lexicals used in exactly the same way as globals are globals too. But that kind of dogmatic policy is not what I meant (or said). Instead, IMHO "globals can and should nearly always be avoided" is intended to cause people to think about what would be a better solution, which is usually a change in design.

    To someone who knows what they're doing, it may be acceptable to sometimes use globals, but again, this thread is in the context of giving advice to a beginner.

    ... (as far as I know) are almost impossible to debug because there is no way to look inside a closure.

    I'm far from an expert with the debugger, but that doesn't appear to be correct.

    $ cat x.pl use warnings; use strict; sub x { my $y = 123; sub { $y += shift; print "$y\n"; print shift->(), "\n"; } } my $z = x; my $foo = "abc"; $DB::single=1; $z->(111, sub { return $foo."def"; }); $ perl -d x.pl ... main::(x.pl:11): my $z = x; DB<1> c main::(x.pl:16): }); DB<1> s main::CODE(0x543ed97f20a0)(x.pl:6): 6: $y += shift; DB<1> y 0 $y = 123 DB<2> s main::CODE(0x543ed97f20a0)(x.pl:7): 7: print "$y\n"; DB<2> y 0 $y = 234 DB<3> s 234 main::CODE(0x543ed97f20a0)(x.pl:8): 8: print shift->(), "\n"; DB<3> s main::CODE(0x55d4d6a31da8)(x.pl:15): 15: return $foo."def"; DB<3> y 0 $foo = 'abc' $z = CODE(0x543ed97f20a0) -> &main::__ANON__[x.pl:9] in x.pl:5-9
      no typo protection

      This can be a legitimate concern in that you will get only a run-time error instead of a compiler error, but I have yet to write code that had enough bareword file handles for this to be a problem for me.

      potential clashes with other globals like package names

      Is this the origin of the convention in Perl of writing file handle names in ALL UPPERCASE? I had picked that up without really knowing why, but preventing that clash (who tries to name a file handle UNIVERSAL?) seems like a good reason for the convention.

      solved by placing the package in a block

      Which I do in my code:

      package Outer; # ... { package Outer::Inner; # ... }

      I also limit this to OOP code where Outer::Inner is an internal implementation detail of Outer. Using multi-package files too much is a good way to go insane, even with Emacs' CPerl mode and speedbar for navigation.

      think about what would be a better solution, which is usually a change in design

      I agree with this in the case of a proliferation of file handles, but bareword file handles remain a useful tool in producing concise I/O code in a main script, while also being visually distinctive in their own way, with both ALLCAPS and different syntax highlighting from variables. This is probably why I have not had typo problems with them, now that I notice it.

      this thread is in the context of giving advice to a beginner

      The concern I have is the possibility of advice that works well for a beginner, but could unintentionally limit their future growth. I am unsure exactly how "always use lexical file handles" would do that, but I am suspicious of "always" and "never" in general.

      almost impossible to debug because there is no way to look inside a closure
      that doesn't appear to be correct

      Yes, you can examine locals if you have execution stopped inside the closure, but given a CODE ref in a structure somewhere, how do you get the closed over values out of it?

        given a CODE ref in a structure somewhere, how do you get the closed over values out of it?

        Ok, I see what you're asking now. Whenever you want to inspect lexical variables for debugging, think PadWalker (that's also what the Perl debugger uses, as per its docs). If you wanted to get fancy, combine it with Data::Dump::Filtered and B::Deparse:

        Is this the origin of the convention in Perl of writing file handle names in ALL UPPERCASE?

        I don't know, but that sounds plausible. All I've found in the Camel 2nd Ed. so far is:

        Since reserved words are always entirely lowercase, we recommend that you pick label and filehandle names that do not appear all in lowercase. ... Using uppercase filehandles also improves readability and protects you from conflict with future reserved words. ... if you have a package called m, s, y, or tr, then you can't use the qualified form of an identifier as a filehandle because it will be interpreted instead as a pattern match, a substitution, or a translation. Using uppercase package names avoids this problem.
        This can be a legitimate concern in that you will get only a run-time error instead of a compiler error

        No, you don't even get a run-time error, just a warning, which can easily be swallowed or go unnoticed if running the script from a web server, a daemon, a cron that can't send mails, a script that generates a lot of other output, and so on.

        I have yet to write code that had enough bareword file handles for this to be a problem for me. ... This is probably why I have not had typo problems with them, now that I notice it.

        Once again, this may apply to you, but I still don't see any good reason to suggest them to newcomers. Lexical filehandles have become the new best practice for lots of good reasons, and I think I've named pretty much all of them by now. One more might be that lexical filehandles are closed automatically when they go out of scope, but bareword filehandles are not (only when the script ends, which may be a lot later, depending on the code).

        bareword file handles remain a useful tool in producing concise I/O code in a main script ...

        But is that really the case? You mentioned using a filehandle named MANIFEST, and you said "Global file handles should have meaningful names." That makes sense since they're package-global, but for lexical filehandles that have limited scope, using a name like $fh is perfectly fine IMHO. open my $fh, '<', 'Manifest' or die $!; my $data = do { local $/; <$fh> }; close $fh; is 12 characters shorter than open MANIFEST, '<', 'Manifest' or die $!; my $data = do { local $/; <MAINFEST> }; close MANIFEST;, and the latter suffers from all the issues I've already named.

        ... while also being visually distinctive in their own way, with both ALLCAPS and different syntax highlighting from variables.

        If that's really the only remaining advantage, which is strongly dependent on the editor and highlighting scheme, then I don't really see how that stacks up against all the disadvantages I named (and that you haven't responded to).

        but I am suspicious of "always" and "never" in general

        I know what you mean, and I feel the same way, but I don't think I said "never" anywhere in this thread. I'm a big fan of TIMTOWTDI, but also Tim Toady Bicarbonate. Don't confuse strict coding policies with best practices - the name of the latter already implies that there are other ways to accomplish the same thing, but best practices exist usually because they have advantages over the other practices, it solves issues that other methods have, and so on.

        In this case, bareword filehandles are no longer the generally recommended practice in part because of the horrible debugging messes that people have had to deal with over the years of people using globals. Lexicals are simply better, which is far from saying "never use globals". But design desicions made to limit globals can also have positive effects on application architecture.

        Although traditionally, Perl has often been used for scripts (I've written plenty myself, and yes, I've used "globals" in a lot of them), something that really opened my eyes is when I spent a few years coding more Java than Perl. In large applications, and especially multi-threaded ones, you just can't use globals, and for example, context objects (when properly implemented) make a ton more sense - they're thread-safe, serializable, etc. and replace the functionality that people often use globals for.

        Just for example, someone coding in Mojolicious, where the production-mode server is multiprocess, may try to use globals to share information, only to find that it simply doesn't work, but someone already used to limiting the scope of their variables as much as possible will likely find it much easier to deal with - at least that's been my experience.

        The concern I have is the possibility of advice that works well for a beginner, but could unintentionally limit their future growth. I am unsure exactly how "always use lexical file handles" would do that

        That might be an important thing to think about, then. Given than they have so many disadvantages, other than making people aware that bareword filehandles exist, what else is there? I think it'd be enough to say something like "by the way, bareword filehandles exist and you may see them in legacy code, but nowadays the best practice is to use lexical filehandles because they have many advantages".

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11117621]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2024-04-19 20:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found