Hello Monks!
I've been learning Perl for some years now. At the same time, moving
from writing awk scripts to writing Perl scripts, I have found Perl to be an amazing resource for getting
things done.
Still, I have some minor issues with the language design that I have
not yet been able to understand/resolve. This is what I want to
discuss here.
Background
It sometimes bugs me that it is so difficult to write Perl code that is
readable (easy to follow) when working with references.
For example, if I see a variable $var in the middle of some
code, it can be a scalar variable, a scalar reference, an array reference, a
hash reference, and so on. Hence, I often end up guessing or having to
scan source code nearby
in order to determine the type of the variable. I find this workflow less than optimal.
Would it not be better if the variable could (optionally) be made
self-documenting with respect to reference type?
In the book Perl Best Practices, the problem is mentioned in another
setting, and the solution suggested is to add the suffix _ref to the
variable name. So one could write,
$var_href = { a => 1 };
to create a hash ref, and
$var_aref = [ 1, 2, 3];
to create an array reference.
However, a problem with this convention could be that the suffix is not
optional. You should not be forced to used the more verbose form of
the variable name. I think, the programmer should have
a choice to decide whether he finds it advantageous to include the suffix at
given place or not. For example, when declaring the variable as
$var = [ 1, 2, 3 ];
it is rather obvious that it is an array reference, and there is no
need to write:
$var_aref = [ 1, 2, 3 ];
The latter is in my opinion too verbose. However, if the reference is
just defined as
my $var;
it would often be better to include the suffix. If there is no
indication on the next lines or so whether $var will be used as an
array reference or not, it would be more readable to define it as
my $var_aref;
A new idea for reference variable naming syntax
So this lead me to an idea: Could the postfix dereferencing syntax
be extended for this use case?
The Postfix Dereferening Syntax (PDS)
was introduced as experimental in 5.20. And starting from 5.24 it is
included in the Perl language by default.
Currently PDS is used for dereferencing:
my @array = $var->@*;
Notice that the PDS includes a star after the sigil. It is a syntax
error not to include the star. But let's say for the moment that if the star was
omitted, the dereferencing was to be simply ignored instead. So
my $var->@;
would mean the same as
my $var;
and produce no syntax error.
Let's denote this new syntax by Optional Postfix
Reference Declaration Syntax (OPRDS). So when using OPRDS, should it be entirely up to
the user to ensure that he used the correct sigil. For example, if I
write
$var->@ = 12;
when I really meant
$var->@ = [ 12 ];
should it produce a compile time error? I think it would be very
helpful if the compiler could use OPRDS to check for consistency.
But it might be difficult to implement? I do not know. If it is
difficult to implement, some alternatives might be used instead?
I don't know much of Perl internals, so this is a point where I need help.
When I started out with this idea, compile time type-checking was
not on my mind at all. But I see now that OPRDS would offer the
opportunity for stricter type checking.
But type checking was not the main issue I wanted to discuss.
What I would like to discuss is how to deal with reference variable names.
Reading and understanding written Perl code can be difficult
since the $ sigil can be used for many data types. How could this situation
be improved?
Re: Improve readability of Perl code. Naming reference variables.
by stevieb (Canon) on Jan 19, 2017 at 20:24 UTC
|
This is one of the many benefits of keeping your code blocks (scope) as small as possible (one screen if possible), as it allows you to visually see what you're dealing with in regards to a variable.
To further, variable naming is important, as is attempting to write your subroutines so they only do one thing, as opposed to a whole bunch of things. This:
my $var->@;
... is far too much typing just for visual purposes (imho), and it adds unnecessary noise. It's just as easy to scope properly and name things appropriately:
my $cat;
That's singular, so I'd put money on the fact it's a scalar (unless it's an object, but if it's an object, it'll be being used much differently than a simple scalar so I digress).
my $cats;
# or
my $cat_list;
...or:
my $cat_map;
Both of those are extremely easy to identify to even a low-intermediate Perl hacker as an array reference in the former case, and a hash ref in the latter (perhaps the author has a hash of cat names to cat colours ;).
What's more, if there is confusion, in decently laid out code, one may have to scroll up only a tiny bit (if at all) to see where the variable is being declared/defined. If you have to search all over the code for where variables are defined, the scope for that variable is not small enough.
Even if you get the type wrong, with use strict; and use warnings; will always let you know the what/where of the problem.
So, in essence, I understand what you're desiring here, but good coding practices alleviate us from (for the most part) needing such visual cues. | [reply] [d/l] [select] |
|
Yeah. I completely agree with you with the idea of keeping subroutines or code blocks
small. It is also my experience that this coding style (that you propose) has potential for
eliminating most readability issues. So the naming issue can usually
be circumvented using proper naming of variables and keeping the scope small.
But it's also my experience that in some cases it would still be beneficial to
have the option to further document the type of a reference variable.
| [reply] |
|
What you're referring to in your last sentence is what some call "edge cases". These edge cases, where there may be ambiguity to the reader of the code is where your extremely brief comments should go. Code should document itself, but if you feel the reader may scratch their head:
...
my $x = thing_list(); # href
...
Of course, that's a pretty trivial example, but you get the point. | [reply] [d/l] |
|
|
|
Re: Improve readability of Perl code. Naming reference variables.
by ww (Archbishop) on Jan 19, 2017 at 20:32 UTC
|
You wrote: "You should not be forced to used the more verbose form of the variable name."
Just in case there's a misunderstanding here, NOTHING in PBP is mandatory. Some of the recommendations do, in fact, reflect a concensus among some Perl programmers. Others are held up to criticism as 'one author's preferences.'
And here's another 'one person's opinion' about your observation that it "should it be entirely up to the user to ensure that he used the correct sigil."
Frankly, that idea is anathema to me; IMO, it's just another way to write code that you will have trouble deciphering sometime down the road, and that some future maintainer will almost certainly find problematic.
| [reply] [d/l] |
|
Yeah, I agree that the suggested reference syntax also could introduce
new issues. For example consider function calls:
func( $var->@ )
Here it is of course possible that the programmer introduces a
typo. First, assume he wrote @ when $var is a scalar (i.e. $var is not a
reference). This typo will of
course confuse a human reader. But the compiler would probably be
quite happy. It would just ignore the optional postfix syntax
(OPRDS). Hence, there will be no runtime issues with this typo either.
Then consider a different typo. The user types @* when he rather meant
to type @:
func( $var->@* )
Now, this is a more serious mistake. The compiler will assume that
the array reference should be dereferenced. Hence, the function will
receive $var->[0] instead of the reference $var, likely to cause some
sort of runtime malfunction that may be difficult to debug.
| [reply] [d/l] [select] |
Re: Improve readability of Perl code. Naming reference variables.
by hippo (Bishop) on Jan 19, 2017 at 22:30 UTC
|
#!/usr/bin/env perl
use strict;
use warnings;
my $var = [1, 3, 5];
my $var_aref = $var;
print "var has values: @$var\n";
print "var_aref has values: @$var_aref\n";
So you, the programmer, can pick and choose which name to use at any point (if you so desire). | [reply] [d/l] |
|
So you would prefer to introduce two reference variables? Sorry, I do not
like this idea. I would prefer to keep the number of variables to a
minimum. Introducing two variables just for the sake of solving a
readability issue seems like a bad idea to me. What if one reference
was changed later in the code? Then you must remember to update the
other reference at the same place also. This clearly becomes a
maintenance problem. And if you forget to update the other variable, there are potential
for more confusion.
| [reply] |
Re: Improve readability of Perl code. Naming reference variables.
by kcott (Archbishop) on Jan 20, 2017 at 11:36 UTC
|
G'day hakonhagland,
"It sometimes bugs me that it is so difficult to write Perl code that is readable (easy to follow) when working with references."
About a dozen or so years ago, I supervised a number of junior programmers who also seemed to have this problem.
I won't go into details beyond saying this caused no end of problems and time spent on debugging exceeded
time spent coding.
I introduced a local coding standard, that required a prefix on all variable names whose values were references. There may have been some special cases, but these were the main ones:
-
$rs_name : scalarref
-
$ra_name : arrayref
-
$rh_name : hashref
-
$rc_name : coderef
-
$rg_name : globref
-
$ro_name : object reference
The concept was simple and mostly fixed the problem.
Using the wrong operation on a variable was usually easy to identify
(e.g. $ra_name->{...}, $rs_name->method(...), $rh_name->(...), and so on).
Subsequent reading of the code, for maintenance or debugging, was made easier.
I should also point out another policy that the 'name' part had to be meaningful and,
as far as possible, self-documenting.
This typically meant that, if a wrong letter (identifying the reference type) was used,
it would be picked up by strict (whose use was also mandatory).
While this was fine for that situation and environment, it doesn't really suit my personal style
and I no longer use it:
I much prefer to use the smallest possible, lexical scopes
where these sort of problems generally don't occur.
However, if you think this, or something like it, will help to improve your coding,
perhaps give it a try and see if it works for you.
"Let's denote this new syntax by Optional Postfix Reference Declaration Syntax (OPRDS)."
I didn't like this idea at all.
With postfix dereferencing, $var remains the variable and ->@* is an operation on that variable.
With your OPRDS, $var->@ seems to be a separate variable and operation (in your OP);
subsequently, in one of your responses, you use func( $var->@ ),
where $var->@ now apparently represents the entire variable.
You also seemed to get confused with
"func( $var->@* ) ... the function will receive $var->[0] ...":
in fact, the function will receive @$var.
You may have had typo(s) in that response, but I found myself scrolling back and forth to understand what was going on: the very problem you're attempting to avoid: "... scan source code nearby in order to determine ...".
| [reply] [d/l] [select] |
|
> $rs_name : scalarref
$ra_name : arrayref
$rh_name : hashref
$rc_name : coderef
$rg_name : globref
$ro_name : object reference
I'm using something very similar but without the redundant r in front, e.g $c_block .
Now I'm wondering why you use them... :)
| [reply] [d/l] [select] |
|
Hello kcott!
Interesting to hear about your experience teaching students. I am sure
the style
you introduced might indeed help improve the situation you described.
But once you have declared a variable with a prefix, it is no longer
optional to remove the prefix. This is why I don't like the idea of
a prefix that is part of the variable name. A prefix as a part of the
sigil would seem like a better idea. Then it could be made optional.
For example, consider a function called with three references. A
scalar reference, a hash reference, and a string reference;
sub func {
my ( $rs_str, $hr_desktop_info, $ha_files ) = @_;
$$rs_string = update_string_ref();
for ( keys %$hr_desktop_info ) {
...
push @$ha_files, $file;
}
....
}
I seems to me like the prefixes will introduce too much noise in the
source code. In this case, it might be better if only the first line in the function
documented the type of the reference, and then subsequent lines could
omit the variable name prefix:
sub func {
my ( $rs_str, $hr_desktop_info, $ha_files ) = @_;
$$str = update_string_ref();
for ( keys %$desktop_info ) {
...
push @$files, $file;
}
....
}
Of course, the above code is not yet possible. And further it could
not easily be made part
of Perl in the future. But maybe a new type of prefix could be used, for example
$>$, $>%, and $>@ ?
sub func {
my ( $>$str, $>%desktop_info, $>@files ) = @_;
....
}
On the other hand, I can see the clash here with the Perl special variable $> (The
effective uid of this process). So this syntax might be difficult to
implement.
Regarding the last point of your reply. Yes, I agree that if I call func( $var->@* ), the
function will indeed receive @$var. But I assumed a function definition
on the form
sub func {
my ( $var ) = @_;
...
}
Now, the function would "receive" $var->[0] ( in the sense that $var in the function will be equal $var->[0] of the caller). But I think this
(minor) issue of whether the function receives the whole array or only
its first item is just a distraction from the main topic of the
discussion. So I will not go further into the issue.
| [reply] [d/l] [select] |
|
sub func {
my ( $rs_str, $hr_desktop_info, $ha_files ) = @_;
$$str = update_string_ref();
for ( keys %$desktop_info ) {
...
push @$files, $file;
}
....
}
I rather feel that $$str, %$desktop_info and @$files
make it pretty clear, not only that your dealing with references,
but also what type of references they are.
If you're having problems reading that, I suggest you do what ++stevieb
has already alluded to and put the prefixes in a comment.
Something like:
my ($str, $desktop_info, $files) = @_; # rs, rh, ra
Update (minor typo fix): s{deck}{desk} in ..., %$decktop_info and ....
| [reply] [d/l] [select] |
Re: Improve readability of Perl code. Naming reference variables. [New Perl Feature]
by kcott (Archbishop) on Jan 22, 2017 at 22:33 UTC
|
#!/usr/bin/env perl
use 5.025003;
use strict;
use warnings;
no warnings 'experimental::refaliasing';
use experimental qw{refaliasing declared_refs};
use feature 'declared_refs';
{
my $str = \'string';
my $list = [qw{a b c}];
my $map = {x => 24, y => 25, z => 26};
func($str, $list, $map);
}
sub func {
my (\$string, \@array, \%hash) = @_;
say $string;
say "@array";
say "$_ => $hash{$_}" for sort keys %hash;
return;
}
Sample run:
$ perl -v | head -2 | tail -1
This is perl 5, version 25, subversion 9 (v5.25.9) built for darwin-th
+read-multi-2level
$ pm_1179933_test_exp_declared_refs.pl
string
a b c
x => 24
y => 25
z => 26
Important:
Do note that this feature is experimental;
subject to change;
and, as such, not suitable for production code.
| [reply] [d/l] [select] |
Re: Improve readability of Perl code. Naming reference variables.
by johngg (Canon) on Jan 20, 2017 at 10:45 UTC
|
my $rsSomeValue = \ do { my $val = 42 }; # scalar ref
my $raCats = [ qw{ Tiddles Desmo Felix } ] # array ref
my $rhAges = { John => 23, Bill => 35 }; # hash ref
my $rcDoIt = sub { return $_[ 0 ] * 3 }; # code ref
my $rxPat = qr{abc}; # regexp ref
my $roObj = Some::Pkg->new() # object ref
If I see something matching m{\$r[sahcxo][A-Z]} I know I'm dealing with a reference.
| [reply] [d/l] [select] |
|
Hello johngg.
Yes, it is shorter, but is it more readable? This also extends to
the discussion of whether to use snake_case or camelCase. In my
opinion camelCase is more succint (easier to type), whereas
snake_case is more readable (but more difficult to type).
I also once used camelCase, so I can understand your choice.
My main objection though is that the prefix syntax (that you propose) is not optional.
See also comment to kcott for more information.
| [reply] |
|
Camel case makes my skin crawl. Hate it. Give me underscores or give me death.
| [reply] |
|
| [reply] |
|
|
Re: Improve readability of Perl code. Naming reference variables.
by stevieb (Canon) on Jan 22, 2017 at 00:43 UTC
|
With all that was said here on this thread, I want to say that while learning, knowing what is what is definitely important. Knowing this thread is so relevant still, I thought I'd share an example of where I'm learning something (converting Perl variables to C), that indeed, it's useful to be able to identify your vars so one understands what's happening. See this thread. That naming convention won't last but a day, but it can be useful in code while still trying to grasp what's going on.
| [reply] |
|
|