Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

On Backwards Compatibility and Bareword Filehandles

by jcb (Vicar)
on Jul 17, 2020 at 04:33 UTC ( #11119439=perlmeditation: print w/replies, xml ) Need Help??

While participating in some of the recent discussions about plans for Perl 7 and the longer-term future of Perl, I found a solution that I wish to offer here.

A Ground Rule

First, I want to get what should be obvious as a ground rule: removing features from Perl requires significant justification, and style is never enough to remove a feature. The rationale for this rule is simple: Perl has long held to TIMTOWTDI and style varies. If we allow the precedent of removing features from the language because the pumpking thinks they are ugly, the next pumpking will have different tastes, and the next-next different tastes still — the result will be a disaster reminiscent of Fahrenheit 451. (In that story, banning all books grew out of lots of little bits of censorship.)

"There are some things you should learn to live without, even in Perl 5 land." is not an appropriate attitude to take, and that quote is from the Perl 7 announcement.

What is significant justification?

This is a good question. I believe that reasonable people can agree that style is not enough, particularly with a language that touts TIMTOWTDI as Perl does, but what is good enough?

I will argue that significant improvements to the interpreter can justify at least some changes. Significant improvements in compatibility can justify broader use of UTF-8 (as long as there is some pragma for treating strings as uninterpreted octet strings; handling binary data is one of Perl's strengths). Rolling pragmas like use strict; into defaults is reasonable, as long no strict continues to exist. (At least some useful metaprogramming requires no strict 'refs'; to install subs from templates.)

Indirect Object syntax and Bareword Filehandles

The proposal to remove these has caused much rancor, with justifications that only support removing either presented for removing both and flames producing far more heat than light.

The indirect object (IO) syntax seems to be a generalization of an older Input/Output (I/O) syntax that allowed print FILEHANDLE EXPR instead of requiring the use of select to change the default output handle. This was generalized with the introduction of IO::Handle and can also be used to write code (particularly object constructors) that reads much like English: new Foo::Object (ARGS); kill $object with => 'fire'.

The IO syntax is not without its problems, however. The historical similarity to the I/O syntax creates some parse conflicts and it is not possible to call a constructor named open in IO syntax because of these parse conflicts, but I offer a solution in three parts:

Regularized I/O

The solution starts by regularizing all I/O handles into =IO objects and eliminating the *foo{IO} GV slot. This is entirely reasonable in a major release and can be done while breaking relatively little code. This allows resolving the IO|I/O parse conflict at last — it is always an IO method call, with a small amount of new magic for open:

Lexical Bareword Filehandles

Perl 5 allows open my $foo, ... (lexical) and open FOO, ... (traditional) for opening files. Notably, neither of these uses the IO syntax; they parse as open( my $foo, ...) and open( FOO, ...). I propose generalizing this to also allow open our $foo, ... (to explicitly open a global filehandle; remember TIMTOWTDI) and open state $foo, ... (a conditional open iff the lexical state variable $foo is currently undef or a closed handle). Update: As haukex points out, this generalization was so obvious that it has already been done.

The parser has enough information to take one step farther, as the title of this section suggests, and make bareword filehandles lexical variables. The open keyword (when parsed as the builtin; distinguishable by the absence of "::" in the bareword and the presence of a comma following the bareword) functions as a lexical filehandle declaration. The parser raises an error if the new I/O variable would shadow any other bareword known to the parser, so these filehandles cannot conflict with sub names or package names. The new I/O variable carries an invisible sigil, so:

open FILE, '<', $file or die "..."; while (<FILE>) { print if m/interesting/; handle(FILE, 'that') if m/that/; } close FILE;

parses as if: (with the invisible I/O sigil represented as {I/O})

open my {I/O}FILE, '<', $file or die "..."; while (<{I/O}FILE>) { print if m/interesting/; handle({I/O}FILE, 'that') if m/that/; } close {I/O}FILE;

which in turn is equivalent to:

open my {I/O}FILE, '<', $file or die "..."; while (defined($_ = {I/O}FILE->readline)) { print $_ if $_ =~ m/interesting/; handle({I/O}FILE, 'that') if $_ =~ m/that/; } {I/O}FILE->close;

with sub handle like:

sub handle { my $fh = shift; my $what = shift; ... }

As you can see, this provides an elegant solution for passing filehandles to subroutines while also bringing the typo protection afforded by declaring variables to bareword filehandles and solving most of the other problems. The {I/O}FILE variable only exists within the block where it was declared, so there is no risk of action at a distance and programs that relied on that action will fail to compile. The {I/O}FILE variable actually simply contains an =IO object like any other, but the interpreter may be able to optimize with the knowledge that it will always contain an =IO object; there is no way to assign to an I/O variable other than open.

One more small cleanup

All this leaves an edge case that can finally be fixed: using open as a class method. The parser can recognize barewords containing "::" as PACKAGE tokens; a PACKAGE token in the slot for a named operator or sub name is interpreted as a fully qualified sub name, but BAREWORD PACKAGE EXPR is a class method call on PACKAGE. Always; even if BAREWORD is "open".

Thanks and Discussion

I would like to thank my fellow monks with whom I have had much discussion on related topics, particularly haukex, LanX, chromatic, and WaywardCode, along with any others I have forgotten to mention here. Lastly, I would like to thank you for reading this and invite you to discuss below.

Replies are listed 'Best First'.
Re: On Backwards Compatibility and Bareword Filehandles
by chromatic (Archbishop) on Jul 17, 2020 at 19:34 UTC
    The parser raises an error if the new I/O variable would shadow any other bareword known to the parser, so these filehandles cannot conflict with sub names or package names.

    There are plenty of ways to get sub names or package names not known to the parser at the point where the parser encounters one of these barewords.

    BAREWORD PACKAGE EXPR is a class method call on PACKAGE. Always; even if BAREWORD is "open".

    I think the same problem applies here. How is the parser to know that BAREWORD will eventually become PACKAGE, when the package is defined later?

    If we allow the precedent of removing features from the language because the pumpking thinks they are ugly...

    I think you're being unfair with this characterization. The desire to reduce indirect object notation is not primarily aesthetic; it's primarily because it's too often statically undecidable and it's fragile with regard to the action at a distance of what other code has been parsed and when.

    Put another way, if I name a bareword filehandle JSON or YAML because I'm going to read or write some structured data, I don't want my program to behave unpredictably depending on when the modules of the same name get loaded.

      There are plenty of ways to get sub names or package names not known to the parser at the point where the parser encounters one of these barewords.

      If it is not known to the parser, it cannot cause a parse conflict, and the lexical name prevails for the scope of the block in which it is defined. Existing convention discourages naming subroutines or packages in ALL UPPERCASE anyway, and I do not suggest changing that.

      How is the parser to know that BAREWORD will eventually become PACKAGE, when the package is defined later?

      I had just explained that the parser could recognize PACKAGE tokens because they contain at least one "::". An explicit "::" can be prefixed to top-level packages when needed or assumed when no parse conflict exists.

      The desire to reduce indirect object notation is not primarily aesthetic; it's primarily because it's too often statically undecidable and it's fragile with regard to the action at a distance of what other code has been parsed and when.

      That is a reason to adjust it to make it decidable, not to remove it entirely. Lexical bareword filehandles preserve (most) compatibility while removing (at least some) parse ambiguity: a lexical bareword filehandle parses as VARIABLE rather than BAREWORD. Similarly, recognizing barewords containing "::" as PACKAGE tokens disambiguates class method calls.

      if I name a bareword filehandle JSON or YAML because I'm going to read or write some structured data, I don't want my program to behave unpredictably depending on when the modules of the same name get loaded.

      This proposal does not have that problem — within the scope of a JSON or YAML filehandle, those tokens are variable references rather than package names, but could still be disambiguated as ::JSON or ::YAML to get a PACKAGE token if you wanted to call a class method while inside such a scope. (Note that in the scope of open JSON,..., JSON->new would parse as a method call on the {I/O}JSON variable because part of the proposal is for indirect object and arrow notations to be equivalent.)

        Existing convention discourages naming subroutines or packages in ALL UPPERCASE anyway, and I do not suggest changing that.

        Sure, but I provided two widely-used examples that violate this convention. I could also mention LWP, DBI, CGI....

        An explicit "::" can be prefixed to top-level packages when needed or assumed when no parse conflict exists.

        The problem is I don't know when a parse conflict will exist.

        If I write code with a bareword filehandle in a module, I don't know how that module will be used. I don't know when it will be loaded. I don't know what will be loaded before it and I don't know what will be loaded after it.

        If the parser parses this construct in different ways depending on what's loaded, it's fragile and undecidable.

        An explicit "::" can be prefixed to top-level packages when needed or assumed when no parse conflict exists.

        I don't see how this solves my problem here. I don't want to refer to a top-level package. I want a bareword filehandle (for the sake of argument, at least—I don't want bareword filehandles at all, because of this problem).

        Even if this did solve the problem, I don't think it's worth the tradeoff. Now you've introduced another implicit rule, which is that we must prefix all single-word package names to avoid any potential conflict with a bareword filehandle somewhere.

        I use a lot more package names than I do filehandles.

Re: On Backwards Compatibility and Bareword Filehandles
by haukex (Bishop) on Jul 17, 2020 at 07:24 UTC
    I propose generalizing this to also allow open our $foo, ... (to explicitly open a global filehandle; remember TIMTOWTDI) and open state $foo, ...
    use warnings; use strict; use feature 'state'; sub xyz { open state $foo, '<', '/tmp/foo' or die $!; chomp( my $bar = <$foo> ); close $foo; return $bar; } open our $foo, '>', '/tmp/foo' or die $!; print $foo "Hello\n"; close $foo; use Test::More tests=>1; is xyz(), "Hello"; # => PASS
    the result will be a disaster reminiscent of Fahrenheit 451.

    A variation on Godwin's law?

      Thanks for pointing that out; I have updated the text. Is open state also conditional currently or will it replace an already-open handle?

      Godwin's law applies to hyperbolic comparisons; the reference to Fahrenheit 451 was to show that we are considering the first steps down a path that can lead to disaster. Removing code from programs is generally beneficial, but removing features from languages is almost always impoverishing. I could have cited Newspeak in 1984 as another example, but 1984 featured censorship imposed from a dictatorial top, while the censorship in Fahrenheit 451 was explained to have been more of a "crowd-sourced" phenomenon.

        Is open state also conditional currently or will it replace an already-open handle?

        Sorry, but I have to say: You already didn't do your due dilligence in the root node, and now you seriously expect me to do more of your work for you to support an argument I don't necessarily agree with? The internet being what it is, my troll-o-meter is going off.

        Godwin's law applies to hyperbolic comparisons

        It absolutely does not; and where it applies is extremely clear.

Re: On Backwards Compatibility and Bareword Filehandles
by syphilis (Bishop) on Jul 17, 2020 at 07:49 UTC
    Interestingly, the latest Perl 7 proposals make no mention of bareword filehandles.
    However, I think those proposals relate to 7.0 - and they don't discuss the additional changes that might come with 7.1.

    Cheers,
    Rob

      This proposal is an effort to find some way forward without needlessly breaking backwards compatibility.

Re: On Backwards Compatibility and Bareword Filehandles
by ikegami (Pope) on Jul 17, 2020 at 18:05 UTC

    I don't see the basis for these changes. What's the problem with using a scalar? Maybe you could specify what problem you are trying to solve? That would be the first step in providing the needed justification that appears completely missing.

    The historical similarity to the I/O syntax creates some parse conflicts and it is not possible to call a constructor named open in IO syntax because of these parse conflicts, but I offer a solution in three parts:

    But that problem is already being solved. That's only a problem when using indirect method notation, which has long been discouraged by some (incl myself), and which is being "removed".

      The problem with using scalars for all I/O handles is all of the existing code that uses barewords; this solution maintains backwards compatibility at least for the simple cases that I expect are the most common.

      Another problem is philosophical: removing features from the language because you do not like that style is not the Perl way. Perl advocates TIMTOWTDI; One True Right And Only Way is Python's niche and Python fills that niche very well.

        Another problem is philosophical: removing features from the language because you do not like that style is not the Perl way.

        I continue to fail to understand this argument. Who is saying "bareword filehandles should go away solely because I don't like the style"

        Can you do Chesterton's Fence here? What technical arguments are there for discouraging bareword filehandles? Or, to make this easier, undeclared barewords in general?

        It would be far simpler to automatically convert

        open(FOO, ...)

        to

        open(local *FOO, ...)

        instead. While FOO is still technically global, the caller is protected.

        Note that automatically limiting the scope of variables used as file handles will break code.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://11119439]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (5)
As of 2020-09-19 03:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    If at first I donít succeed, I Ö










    Results (114 votes). Check out past polls.

    Notices?