perlmeditation
jcb
<p>While participating in some of the recent discussions about plans for Perl 7 and the longer-term future of Perl, I found a solution that I wish to offer here.</p>
<h4>A Ground Rule</h4>
<p>First, I want to get what should be obvious as a ground rule: <i>removing features from Perl requires significant justification, and style is <b>never</b> enough to remove a feature</i>. The rationale for this rule is simple: Perl has long held to TIMTOWTDI and style varies. If we allow the precedent of removing features from the language because the pumpking thinks they are ugly, the next pumpking will have different tastes, and the next-next different tastes still — the result will be a disaster reminiscent of <i>Fahrenheit 451</i>. <small>(In that story, banning all books grew out of lots of little bits of censorship.)</small></p>
<p>"There are some things you should learn to live without, even in Perl 5 land." is <b>not</b> an appropriate attitude to take, and that quote is from the [https://www.perl.com/article/announcing-perl-7/|Perl 7 announcement].</p>
<h5>What is significant justification?</h5>
<p>This is a good question. I believe that reasonable people can agree that style is not enough, particularly with a language that touts TIMTOWTDI as Perl does, but what <b>is</b> good enough?</p>
<p>I will argue that significant improvements to the interpreter can justify at least some changes. Significant improvements in compatibility can justify broader use of UTF-8 (as long as there is some pragma for treating strings as uninterpreted octet strings; handling binary data is one of Perl's strengths). Rolling pragmas like <c>use strict;</c> into defaults is reasonable, as long <c>no strict</c> continues to exist. (At least some useful metaprogramming requires <c>no strict 'refs';</c> to install <c>sub</c>s from templates.)</p>
<h4>Indirect Object syntax and Bareword Filehandles</h4>
<p>The proposal to remove these has caused much rancor, with justifications that only support removing either presented for removing both and flames producing far more heat than light.</p>
<p>The indirect object (IO) syntax seems to be a generalization of an older Input/Output (I/O) syntax that allowed <c>print FILEHANDLE EXPR</c> instead of requiring the use of [doc://select] to change the default output handle. This was generalized with the introduction of [module://IO::Handle] and can also be used to write code (particularly object constructors) that reads much like English: <c>new Foo::Object (ARGS)</c>; <c>kill $object with => 'fire'</c>.</p>
<p>The IO syntax is not without its problems, however. The historical similarity to the I/O syntax creates some parse conflicts and it is not possible to call a constructor named <c>open</c> in IO syntax because of these parse conflicts, but I offer a solution in three parts:</p>
<h5>Regularized I/O</h5>
<p>The solution starts by regularizing all I/O handles into <c>=IO</c> objects and eliminating the <c>*foo{IO}</c> <c>GV</c> slot. This is entirely reasonable in a major release and can be done while breaking relatively little code. This allows resolving the <tt>IO|I/O</tt> parse conflict at last — it is always an IO method call, with a small amount of new magic for <c>open</c>:</p>
<h5>Lexical Bareword Filehandles</h5>
<p>Perl 5 allows <c>open my $foo, ...</c> (lexical) and <c>open FOO, ...</c> (traditional) for opening files. Notably, <i>neither</i> of these uses the IO syntax; they parse as <c>open( my $foo, ...)</c> and <c>open( FOO, ...)</c>. I propose generalizing this to also allow <c>open our $foo, ...</c> (to explicitly open a global filehandle; remember TIMTOWTDI) and <c>open state $foo, ...</c> (a conditional open iff the lexical state variable $foo is currently <c>undef</c> or a closed handle). <ins><small><b>Update:</b> As [haukex] [11119441|points out], this generalization was so obvious that it has already been done.</small></ins></p>
<p>The parser has enough information to take one step farther, as the title of this section suggests, and <i>make bareword filehandles lexical variables</i>. The <c>open</c> keyword (when parsed as the builtin; distinguishable by the absence of "<tt>::</tt>" in the bareword and the presence of a comma following the bareword) functions as a lexical filehandle declaration. The parser raises an error if the new I/O variable would shadow any other bareword known to the parser, so these filehandles cannot conflict with <c>sub</c> names or package names. The new I/O variable carries an invisible sigil, so:</p>
<code>
open FILE, '<', $file or die "...";
while (<FILE>) {
print if m/interesting/;
handle(FILE, 'that') if m/that/;
}
close FILE;
</code>
<p>parses as if: (with the invisible I/O sigil represented as <c>{I/O}</c>)</p>
<code>
open my {I/O}FILE, '<', $file or die "...";
while (<{I/O}FILE>) {
print if m/interesting/;
handle({I/O}FILE, 'that') if m/that/;
}
close {I/O}FILE;
</code>
<p>which in turn is equivalent to:</p>
<code>
open my {I/O}FILE, '<', $file or die "...";
while (defined($_ = {I/O}FILE->readline)) {
print $_ if $_ =~ m/interesting/;
handle({I/O}FILE, 'that') if $_ =~ m/that/;
}
{I/O}FILE->close;
</code>
<p>with <c>sub handle</c> like:</p>
<code>
sub handle {
my $fh = shift;
my $what = shift;
...
}
</code>
<p>As you can see, this provides an elegant solution for passing filehandles to subroutines while also bringing the typo protection afforded by declaring variables to bareword filehandles and solving most of the other problems. The <c>{I/O}FILE</c> variable only exists within the block where it was declared, so there is no risk of action at a distance and programs that relied on that action will fail to compile. The <c>{I/O}FILE</c> variable actually simply contains an <c>=IO</c> object like any other, but the interpreter may be able to optimize with the knowledge that it will <i>always</i> contain an <c>=IO</c> object; there is no way to assign to an I/O variable other than [doc://open].</p>
<h5>One more small cleanup</h5>
<p>All this leaves an edge case that can finally be fixed: using <c>open</c> as a class method. The parser can recognize barewords containing "<c>::</c>" as <tt>PACKAGE</tt> tokens; a <tt>PACKAGE</tt> token in the slot for a named operator or <c>sub</c> name is interpreted as a fully qualified <c>sub</c> name, but <tt>BAREWORD PACKAGE EXPR</tt> is a class method call on <tt>PACKAGE</tt>. Always; even if <tt>BAREWORD</tt> is "<c>open</c>".</p>
<h4>Thanks and Discussion</h4>
<p>I would like to thank my fellow monks with whom I have had much discussion on related topics, particularly [haukex], [LanX], [chromatic], and [WaywardCode], along with any others I have forgotten to mention here. Lastly, I would like to thank <i>you</i> for reading this and invite you to discuss below.</p>