Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Use of uninitialized variables?

by Zadeh (Beadle)
on Jun 11, 2008 at 16:31 UTC ( [id://691500]=perlquestion: print w/replies, xml ) Need Help??

Zadeh has asked for the wisdom of the Perl Monks concerning the following question:

As a matter of practice, I've always saw to it that any variables I declared were initialized. That includes scalars, hashes, and whatever else:
my $retval = 0; my @list = (); my %hash = ();
In Perl Best Practices (PBP) section 5.2, there is talk of initializion but only in the context of locals:
"Even if you specifically did want that variable to be undefined, it's better to say so explicitly: ... That way any readers of the code can immediately see that the lack of definition is intentional, rather than wondering whether it's an oversight."
What about the general case? Arguably, it's not of much significance in perl compared to other languages since your data will just get the value undef, but I'd rather not have to deal with undef or all the defined() checks on variables. I'd rather just assume, in my own code at least, that something undef indicates a problem somewhere. During a recent code review this issue came up, and a fellow co-worker argued against this practice because:
Outside subroutines, there's a big difference. As package variables, the assignments happen at certain times. For example, you have a package variable that gets assigned in a subroutine. You have no idea when that subroutine is called; it can be called from a BEGIN {} block. Then later, if you have that package variable assigned during a regular block, the values get overriden by my @var = ();. The = () happens during the normal execution time but the my @var exists before the BEGIN block so you end up wiping out everything you put in your package variables this is really bad if you're doing inside-out classes.
Putting aside the issue of whether one ought to be using inside-out classes, what do you perl monks find to be the best practice?

Replies are listed 'Best First'.
Re: Use of uninitialized variables?
by ikegami (Patriarch) on Jun 11, 2008 at 18:30 UTC

    First of all, the reasoning used in the quote doesn't really apply to hashes and arrays since they start out empty more often than none. That's probably why the quoted practice refers to undefined variables (which can only be scalars) instead of uninitialized variables. For example,

    my @results; while (...) { ... push @results, ...; }

    Secondly, I don't agree with the quoted practice for scalars either.

    If it's that hard to tell if a variable was left undefined by accident, the solution isn't to add a useless initializer, it's to fix the documentation (maybe by adding a comment) or clean up the code (maybe by localizing your variable better). For example,

    my $found = 0; my $match; # Only meaningful when $found is true. ...

    If your goal is to avoid accidental omission of an initializer (as the quoted practice states), conditioning yourself to always add an initializer isn't going to help. You'll just substitute another problem (initializing your variable incorrectly) for the one you are trying to fix (forgetting to initialize your variables).

    And honestly, how big of a problem is forgetting to initialize your variables? Warnings do an excellent job of finding those instances. The cost (clutter) doesn't warrant the price.

    That said, I don't buy your coworker's argument either. I feel that initializing a variable using a BEGIN block on the next line is well within the sentiment of the quoted practice, so I think his counter-argument doesn't disproves anything. Tell me how anyone could think anything but "the lack of definition is intentional" in the following:

    my $var1; my $var2; BEGIN { $var1 = ...; $var2 = ...; }
Re: Use of uninitialized variables?
by kyle (Abbot) on Jun 11, 2008 at 16:47 UTC

    The comment from your co-worker may be confusing package variables with lexical my variables, which are completely different. In a nutshell, "my @var = ()" can't overwrite a package variable named @var.

    I don't see any problem with initializing everything upon creation. I prefer to use exceptions to indicate there's a problem rather than magic values, undef or otherwise. I initialize things a lot myself, now that I think about it. This is a habit I've carried over from C.

    I'd ask your co-worker to come up with some code that demonstrates the problem.

      I agree with kyle about the probable confusion on the part of the OP's co-worker concerning the purpose and behavior of package and lexical variables.

      Furthermore, if you have a Perl 5 'class', it is probably based on the package mechanism. If in this class (i.e., package) there exist package and lexical (i.e., my) variables with the same name, this probably represents a problem in and of itself: someone is going to get terribly confused and the whole affair will end in tears.

        And yet furthermore, the existence of a package (hence global) variable is, in and of itself, cause for suspicion during code review or at any other time.
Re: Use of uninitialized variables?
by Herkum (Parson) on Jun 11, 2008 at 17:57 UTC

    I prefer to never initialize a variable as empty. Why? It is the best way to find out if you are actually using the variable in your code. Quick sample,

    use warnings; my $retval = 0; warn "Return Value $retval\n";

    You have not done anything with this value, and nothing gets reported as being a problem. If you do this,

    use warnings; my $retval; warn "Return Value $retval\n";

    You are going to get an uninitialized value warning, you will probably be saying WTF? Until you look and realize the code that you wanted was this,

    use warnings; my $retval; $retval = get_value_from_subroutine(); warn "Return Value $retval\n";

    It also reduces the amount of code that you have, I admit it is small, but do it a couple of hundred times, you get the idea.

      Hadn't thought of this point -- finding dead code. I think there's gotta be a better way for finding that info though. Next question that pops into my head is if I could write a unit test, run it and get coverage info -- which would also provide enough info to deduce where dead/unused code was.

        You should take a look at Devel::Cover then. I have used it for unit testing and I found it fairly useful.

        I have not used it for program execution but it looks like it supports it. The only reason I thought of for not using it are that you cannot execute all branches of your code by running a program. Maybe it has a way of dealing with it but I am not sure, someone else may have the answer.

Re: Use of uninitialized variables?
by GrandFather (Saint) on Jun 11, 2008 at 21:42 UTC

    Best practice is to always use strictures, declare variables to minimize their scope and initialize scalars with a useful value at declaration time (except where conditional initialization is required).

    Never initialize a scalar variable with a 'junk' value just to shut up warnings! The warnings give you a heads up about bugs in your code.

    undef is a vitally important variable state that allows a lot of sanity checking that otherwise is difficult to achieve. Strictures are your friend, perhaps even a best friend.

    Note that arrays and hashes don't need to be "initialized". They start empty in any case so something of the form my @array = (); is redundant.


    Perl is environmentally friendly - it saves trees
      OK, so for arrays and hashes there is no need. This is just about scalars then.
      I certainly don't advocate initialization for it's own sake simply to shutup warnings about uninitialized variables.

      An idiom I like to use in subs where something like:
      sub foo { my $retval = 0; <... lots of code here ...> return $retval; # This is the only return in the sub }
      I like to use this for several reasons

      1) It's guaranteed that the sub will always return a value.
      2) There is a single entry/exit point. Apart from it being simple, I know that this makes it easier for compilers/interpreters to prove certain things about your code. Can't say I know if it will make any difference whatsoever in Perl, but it often does in C-like languages.
      3) The initial value of 0 is assumed to be the failure case, putting the burden of 'proof' to set $retval properly. Hope to see a failure case quicker this way.

      When possible, I also prefer to see code more like:
      ... my $whatever = some_sub();
      I find the 'classic C' style of declaring lots of variables at the top of your routine, then using them in the body, to be harder to follow. e.g.:
      my $var1; my $var2; my $var3; my @list; my %hash; ...
      Worse if they're all global! I don't like jumping back and forth, or wondering where variables got their values (or what they were initialized as/from). Arguably if the code was done well enough in the first place it would already be pretty short and there would be no problem, but unfortunately I see this all the time. IMO it's more error-prone. If that's what you mean when you said "declare variables to minimize their scope" then I agree.

        I like to use early exits because they often save a lot of obfuscating indentation and state management. Test simple cases first and exit early. Complicated cases then tend not to be indented and the conditions that get you to the complicated case are generally much easier to see. Note that this early exit technique works in the context of loops just as well as it does for subs.

        To me a single point of exit seems a rather artificial constraint that often requires extra testing and nesting to achieve. Removing that constraint very often allows much clearer logic flow - it's easier for people to understant and prove, which is much more important than making it easier for compilers.

        Using undef as the failure case allows 0 to be used as a valid return value. That was part of what I was alluding to in the paragraph about "sanity checking".

        C required that all variables be declared at the start of a block. C++ and Perl (to name but two) allow variables to be declared wherever they are needed. So yes, I did mean to take advantage of that feature of Perl (and other languages) to declare variables as close as possible to their first use.


        Perl is environmentally friendly - it saves trees
        There is a single entry/exit point. Apart from it being simple, I know that this makes it easier for compilers/interpreters to prove certain things about your code.

        Maybe for a really stupid compiler, but I have trouble believing any modern optimizing compiler has trouble with escape analysis of this sort these days.

Re: Use of uninitialized variables?
by Glav (Novice) on Jun 11, 2008 at 23:01 UTC
    Hey folks, co-worker here. :) Here's a particular, extremely simplified example of what I'm talking about. The example uses scalars, but applies equally to hashes and arrays. I'm specifically talking about lexical package variables, usually used in the case of inside-out classes. We start by setting up a few variables, in this case, scalars, and then initialize them in various places. We're using undef, but it could be other values. Then, a user of the module or class creates/initializes these values in a begin block of their own, and we see what happens.
    use strict; use warnings; my $stuff1; my $stuff2 = undef; my $stuff3; my $stuff4; BEGIN { $stuff3 = undef; } INIT { $stuff4 = undef; } BEGIN { # hypothetical initialization, creation going on. $stuff1 = 'stuff1 stuff'; $stuff2 = 'stuff2 stuff'; $stuff3 = 'stuff3 stuff'; $stuff4 = 'stuff4 stuff'; } print "stuff 1: $stuff1\n"; print "stuff 2: $stuff2\n"; print "stuff 3: $stuff3\n"; print "stuff 4: $stuff4\n";
    And the results are:
    stuff 1: stuff1 stuff
    Use of uninitialized value in concatenation (.) or string at beginproblem1.pl line 47.
    stuff 2:
    stuff 3: stuff3 stuff
    Use of uninitialized value in concatenation (.) or string at beginproblem1.pl line 49.
    stuff 4:
    
    By assigning 'empty' values to these internal package variables, you end up overriding them if your package or class is used in a BEGIN or INIT block. We're using perl v5.8.8.

      First, let's get the incorrect terminology out of the way. Lexical and package variables are two different kinds of variables, and they are mutually exclusive. There's no such thing as lexical package variables.

      >perl -le"package PkgA; my $foo = 'abc'; package PkgB; print $foo" abc

      I usually use the term global variable for what you call lexical package variables. It's not perfect, but it's not outright wrong.

      Now back to the subject. Why would you initialize a variable twice? That's bad, without or without BEGIN. If I saw code that looked like the following, I'd have a talk with you. And yet, that's exactly what you are doing.

      my $stuff1; my $stuff2 = undef; my $stuff3 = undef; my $stuff4 = undef; $stuff1 = 'stuff1 stuff'; $stuff2 = 'stuff2 stuff'; $stuff3 = 'stuff3 stuff'; $stuff4 = 'stuff4 stuff'; print "stuff 1: $stuff1\n"; print "stuff 2: $stuff2\n"; print "stuff 3: $stuff3\n"; print "stuff 4: $stuff4\n";

      You code without BEGINs should be

      my $stuff1 = 'stuff1 stuff'; my $stuff2 = 'stuff2 stuff'; my $stuff3 = 'stuff3 stuff'; my $stuff4 = 'stuff4 stuff'; print "stuff 1: $stuff1\n"; print "stuff 2: $stuff2\n"; print "stuff 3: $stuff3\n"; print "stuff 4: $stuff4\n";

      So your code should be

      my $stuff1; my $stuff2; my $stuff3; BEGIN { $stuff1 = 'stuff1 stuff'; $stuff2 = 'stuff2 stuff'; $stuff3 = 'stuff3 stuff'; } my $stuff4; INIT { $stuff4 = 'stuff4 stuff'; } print "stuff 1: $stuff1\n"; print "stuff 2: $stuff2\n"; print "stuff 3: $stuff3\n"; print "stuff 4: $stuff4\n";

      Of ir you had anything complex, an adherent to the quoted practice would do

      my $stuff; BEGIN { $stuff = undef; ... ... ... ... Some complex code to initialize $stuff. ... ... ... }

      instead of

      my $stuff; BEGIN { ... ... ... ... Some complex code to initialize $stuff. ... ... ... }
        Wups, mybad on the terminology. I didn't know what to call them but global doesn't seem right either. *shrug* As far as these variables go, my limits the scope in such a way that in a different module the variable isn't available. Try separating the packages into their own modules, like in this example:

        Mytest.pm:

        package Mytest; use strict; use warnings; my $var = 1; 1;
        And then a user of the module:
        use strict; use warnings; use Mytest; print $Mytest::var . "\n";
        You'll get this:
        Name "Mytest::var" used only once: possible typo at usemytest.pl line 5.
        Use of uninitialized value in concatenation (.) or string at usemytest.pl line 5.
        
        Or this:
        use strict; use warnings; use Mytest; print $var;
        You'll get this:
        Global symbol "$var" requires explicit package name at usemytest.pl line 5.
        Execution of usemytest.pl aborted due to compilation errors.
        
        Without strict and warnings:
        use Mytest; print $var;
        or
        use Mytest; print $Mytest::var;
        You get nothing; $var is completely invisible outside it's module...or am I missing something? :/ Is there another way to access that scalar that I'm missing?

        As far as the example goes, it is an overly simplified demonstration of some interactions between many modules in a medium sized tool I'm working on. Here's a slightly more concrete usage, and two instances where the issue happens. p1 shows the same kind of issue as before: by assigning $undef to the variable on the same line it's created, if used later in a begin block, then the undef overrides anything done during the begin block. p2 shows the issue in a much more real world scenario: you have a simple inside out class, and decide to 'initialize' the "private variable" hashes. For that example, I'm trying it three different ways: initializing in line, using the default 'my' behavior, and initializing in a begin block, to show what happens in each of these scenerios:

        #!/usr/bin/perl -w package Problem; use strict; use warnings; use Scalar::Util qw(refaddr); { my %key1_of; # leave blank test my %key2_of = (); # initialize test my %key3_of; BEGIN { %key3_of = (); # initialize with a BEGIN } sub new { my ($class, $ar1, $ar2, $ar3) = @_; my $new_object = bless \do{my $anon_scalar}, $class; $key1_of{refaddr($new_object)} = $ar1; $key2_of{refaddr($new_object)} = $ar2; $key3_of{refaddr($new_object)} = $ar3; return $new_object; } sub get_key1 { my $self = shift; return $key1_of{refaddr($self)}; } sub get_key2 { my $self = shift; return $key2_of{refaddr($self)}; } sub get_key3 { my $self = shift; return $key3_of{refaddr($self)}; } } 1; # Here's a user of the above object. package Main; use strict; use warnings; # Below: Creating values for the sake of showing the issue at hand... # First variable: assigning undef (declaring its initial value), # but due to a subroutine or other complex series of processing, it # gets assigned a value during a BEGIN block my $p1 = undef; # using the 'always assign a value' paradigm here my $p2; # leaving it blank for now my $p3 = undef; # same 'always assign' paradigm here, but not used i +n 'begin' # some processing happens in here; for example, let's say these variab +les are # dependent on user input or automatic config files, so $p1, $p2, $p3 +might # not ever be used. It just turns out that this time, they all are. # However, the user of the package creates these instances of the obje +ct in a # begin block, and for whatever reason, the user can't avoid doing thi +s. BEGIN { $p1 = Problem->new('P11', 'P12', 'P13'); $p2 = Problem->new('P21', 'P22', 'P23'); } $p3 = Problem->new('P31', 'P32', 'P33'); sub printit # for display purposes { my $p = shift; if (defined $p) { print " k1: " . $p->get_key1() . "\n"; my $k2 = $p->get_key2(); $k2 = "(undef)" if ! defined $k2; print " k2: $k2\n"; print " k3: " . $p->get_key3() . "\n"; } else { print " Error: Object is undefined.\n"; } }; print "P1:\n"; printit($p1); print " (Due to scoping of the undef assignment, the object itself i +s lost.)\n"; print "P2:\n"; printit($p2); print " (Note loss of key 2's value; it's lost because %key2_of\n" . " is assigned to () after the object's creation).\n"; print "P3:\n"; printit($p3); print " (As expected, no issues at all).\n";
        And then, the output is:
        P1:
           Error: Object is undefined.
           (Due to scoping of the undef assignment, the object itself is lost.)
        P2:
           k1: P21
        
           k2: (undef)
           k3: P23
           (Note loss of key 2's value; it's lost because %key2_of
            is assigned to () after the object's creation).
        P3:
           k1: P31
           k2: P32
           k3: P33
           (As expected, no issues at all).
        
        As you can see, p1 is completely wiped out by the initial assignment and p2 loses it's member variable due to the assignment of () to it after the object is created in the BEGIN block. Setting up the variable in a begin block ie, initializing it to 'empty', doesn't cause any problems for users, no matter where they use it.

        The issue here is if we're initializing a value, and the initial value is empty or undef, why do it twice? It simply makes the initialization process even more complex. If we're not using it or not using it right away, we might as well take advantage of the behavior for perl to automatically setting scalars to undef or arrays/hashes to empty.

        Or am I just completely off about all this?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://691500]
Approved by citromatik
Front-paged by ikegami
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (6)
As of 2024-04-23 15:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found