Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

How to differentiate an empty array from an unitialized one?

by iatros (Novice)
on Jul 08, 2018 at 21:41 UTC ( [id://1218126]=perlquestion: print w/replies, xml ) Need Help??

iatros has asked for the wisdom of the Perl Monks concerning the following question:

I need to differentiate an empty array from an uninitialized one. Consider this example:

# possible strings come in 4 versions my $str_one = "0,1,.5,0,1" ; my $str_two = ".5,,0,1," ; my $str_three = ",,,," ; my $str_four = "" ;

Converting each $str to an array with

 my @array =  split ( "," , $_ )

yields a surprising result because the undefined elements are suppressed. So $str_one correctly gives @array = ( 0 1 .5 0 1) , but for $str_two the undefined element is skipped and the number of elements is 4 (instead of 5) . Even worse, with str_three and str_four the resulting array is empty (or may I say "undefined") in either case. By the way, the conversion is destructive, i.e. it's not possible to regain $str_three and $str_four by applying join ( "," , @array ). Is this a Perl 5 design flaw? Is there a proper way to work around this problem with Perl 5?

Replies are listed 'Best First'.
Re: How to differentiate an empty array from an unitialized one?
by tinita (Parson) on Jul 08, 2018 at 21:49 UTC
    See split. You can give an extra argument called LIMIT:
    my @array = split m/,/ , $_, -1;
    From perldoc -f split:
    If LIMIT is negative, it is treated as if it were instead arbitrarily large; as many fields as possible are produced.
Re: How to differentiate an empty array from an unitialized one?
by AnomalousMonk (Archbishop) on Jul 08, 2018 at 22:32 UTC

    Further to tinita's post:   Note that when all empty strings are captured, join is orthogonal with split:

    c:\@Work\Perl\monks>perl -wMstrict -le "use Data::Dump qw(pp); ;; for my $str ('0,1,.5,0,1', '.5,,0,1,', ',,,,', '123', '',) { ;; my @ra = split ( ',' , $str, -1 ); my $spstr = join ',', @ra; printf qq{'$str' -> %s -> >$spstr< \n}, pp(\@ra); die qq{split/join not orthogonal: '$str' '$spstr'} unless $str eq $ +spstr; } " '0,1,.5,0,1' -> [0, 1, ".5", 0, 1] -> >0,1,.5,0,1< '.5,,0,1,' -> [".5", "", 0, 1, ""] -> >.5,,0,1,< ',,,,' -> ["", "", "", "", ""] -> >,,,,< '123' -> [123] -> >123< '' -> [] -> ><

    Update: Added the  die qq{ ... } unless $str eq $spstr; statement just to emphasize the point.


    Give a man a fish:  <%-{-{-{-<

Re: How to differentiate an empty array from an unitialized one?
by hippo (Bishop) on Jul 09, 2018 at 08:29 UTC
    it's not possible to regain $str_three and $str_four by applying join ( "," , @array ). Is this a Perl 5 design flaw?

    With a limit of -1 as tinita has demonstrated it is indeed possible and therefore there is no design flaw.

    Is there a proper way to work around this problem with Perl

    The best practice to parsing and manipulating CSV data is to use one of the many modules available for that very purpose: Text::CSV, Text::CSV_XS, Text::xSV, etc.

Re: How to differentiate an empty array from an unitialized one?
by haukex (Archbishop) on Jul 09, 2018 at 09:08 UTC

    You've already got some good responses, I just wanted to clarify terminology.

    I need to differentiate an empty array from an uninitialized one.

    There is no difference, an uninitialized array is empty. There is a difference between an empty array and an array of one or more undef values, but:

    my $str_two =  ".5,,0,1," ; ... for $str_two the undefined element is skipped

    As already pointed out by jwkrahn, in this case, the elements returned by split will not be undefined (undef), instead they will be empty strings. Although both are evaluated as "false" by perl (Truth and Falsehood), and undef evaluates to an empty string, it will normally warn when being evaluated as a string, and also testing with defined will show the difference between the two.

    the resulting array is empty (or may I say "undefined")

    I would recommend not calling an empty array "undefined", because that term is usually reserved for undef, which only applies to scalars like the elements of an array - but an empty array doesn't have any!

    Is there a proper way to work around this problem with Perl 5?

    In your last post from over a year ago, you asked about how to handle CSV data, and I recommended Text::CSV. Let me repeat that recommendation again, since it will properly distinguish between the cases of "" vs. ",,,,", and in the latter case it can even return an array of undef values, if you enable its blank_is_undef option (there is even empty_is_undef).

    Update 2019-08-17: Updated the link to "Truth and Falsehood".

Re: How to differentiate an empty array from an unitialized one?
by jwkrahn (Abbot) on Jul 08, 2018 at 23:54 UTC

    Note also that split only returns strings, it does not return "undefined elements".

      Iirc Split can return undef depending on regex capture groups
        Split can return undef depending on regex capture groups

        Yes, that's correct, for example split( /(x)|(y)/, "axbyc" ) yields ("a", "x", undef, "b", undef, "y", "c"). It's also documented at the bottom of split.

      Indeed, it would be incorrect for Perl to consider an uninitialized value to be an empty string, for much the same reason why, in SQL, NULL is not an empty string.
        it would be incorrect for Perl to consider an uninitialized value to be an empty string

        That is quite incorrect. Perl's undef (the default value for uninitialized scalars) evaluates to an empty string (or zero in numeric context). If warnings are enabled, this will usually warn, but only to indicate that the programmer may have made a mistake, not because it's inherently wrong.

        Perl does exactly that... The question isnt about arrays
Re: How to differentiate an empty array from an unitialized one?
by dsheroh (Monsignor) on Jul 09, 2018 at 09:08 UTC
    with str_three and str_four the resulting array is empty (or may I say "undefined")
    No, you may not.

    Once you execute my @array =  split ( "," , $_ );, @array has a defined value, period. A variable with an undefined (or uninitialized, for that matter) value is a variable which has either never had a value assigned to it or has had its value explicitly removed. Your split assigns a value to @array, therefore @array is not undefined. (If split could return undef, then @array could potentially end up undefined, but split doesn't do that. Even split ",", undef; returns an empty list, which is a defined value.)

    If you want to preserve the empty (not undefined!) values in your input strings, give split a -1 (or any negative integer) as an additional parameter to tell it to keep all values, including leading/trailing empty values:

    $ perl -w -E '@values = split ",", "1,2,3,4", -1; say join ",", @value +s' 1,2,3,4 $ perl -w -E '@values = split ",", ",,3,", -1; say join ",", @values' ,,3, $ perl -w -E '@values = split ",", ",,,", -1; say join ",", @values' ,,,
      @array has a defined value, period. ... therefore @array is not undefined. ... Even split ",", undef; returns an empty list, which is a defined value.

      Sorry, this is incorrect, arrays don't have a concept of "(un)defined", only the elements of an array do - an array is either empty, or not empty. Whether or not the elements of an array are undef or not is a different question - even an array of one undef element is considered "true" by Perl because the array is non-empty.

      $ perl -wMstrict -Mdiagnostics -le 'my @array; print defined(@array)' Can't use 'defined(@array)' (Maybe you should just omit the defined()? +) at -e line 1 (#1) (F) defined() is not useful on arrays because it checks for an undefined scalar value. If you want to see if the array is empty, just use if (@array) { # not empty } for example. Uncaught exception from user code: Can't use 'defined(@array)' (Maybe you should just omit the define +d()?) at -e line 1. $ perl -wMstrict -le 'my @array=(undef); print @array?"true":"false"' true
      Once you execute my @array =  split ( "," , $_ );, @array has a defined value, period.
      $ perl -wMstrict -MData::Dump -e 'my @array; dd @array; $_=""; @array = split(",",$_ ); dd @array' () ()
      If split could return undef ... but split doesn't do that.
      $ perl -wMstrict -MData::Dump -e 'dd split /-|(x)/, "-", -1' ("", undef, "")
        arrays don't have a concept of "(un)defined"
        Yeah, that's an odd design decision which I personally disagree with. I see no logical reason to distinguish between "empty" and "undefined" for scalars, but not for arrays.

        In any case, I considered going into that in my previous comment, but obviously decided against it. I probably should have, given that the literal answer to "How to differentiate an empty array from an uninitialized one?" is "You don't, because Perl doesn't."

        $ perl -wMstrict -MData::Dump -e 'dd split /-|(x)/, "-", -1' ("", undef, "")
        Interesting. I wasn't aware of that behavior in split. Thanks for pointing it out!
Re: How to differentiate an empty array from an unitialized one?
by sundialsvc4 (Abbot) on Jul 09, 2018 at 14:05 UTC

    Three examples should suffice.   (Note that in each case these are Perl one-liners intended for the command-line, so $ characters are escaped for the shell, thusly:   \$.   Only $ is seen by Perl.)

    Without LIMIT, split omits trailing empty groups:

    $ perl -e "use strict; use warnings; my \$a='a:b::c:d::'; my @b = spli +t(/\:/, \$a); use Data::Dumper; print Data::Dumper->Dump([\$a, \@b],[ +'a','b']);" $a = 'a:b::c:d::'; $b = [ 'a', 'b', '', 'c', 'd' ];

    LIMIT=-1 causes them to be included:

    $ perl -e "use strict; use warnings; my \$a='a:b::c:d::'; my @b = spli +t(/\:/, \$a, -1); use Data::Dumper; print Data::Dumper->Dump([\$a, \@ +b],['a','b']);" $a = 'a:b::c:d::'; $b = [ 'a', 'b', '', 'c', 'd', '', '' ];

    Perl-5 sees adjacent commas in a list-constructor as nothing, so it produces a smaller-than-expected list which join then processes, interpreting undef as an empty string:

    $ perl -e "my @a=('a','q',,undef,,'b','c',,); my \$b = join(':', @a); +use Data::Dumper; print Data::Dumper->Dump([\@a, \$b], ['a','b']);"Q $a = [ 'a', 'q', undef, 'b', 'c' ]; $b = 'a:q::b:c';

      I congratulate you for posting some actual Perl code!
      I do have some quibbles with it. For example, you are experienced enough to know never to use $a or $b as a user variable name since these variable names have special meaning within Perl.
      I don't want to be overly critical lest I discourage you from posting code.

      Update: Three examples should suffice. Ha!

      You did not consider blank fields at the beginning of the line.
      I don't think that any beginning Perl'er needs to know this, but:

      #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $str =" abc xyz \n"; my @tokens = split(/\s+/,$str); print Dumper \@tokens; =PRINTS: $VAR1 = [ '', 'abc', 'xyz' ]; =cut @tokens = split (' ',$str); print Dumper \@tokens; # note: leading blank field is not there! # no need to remove spaces at beginning or end of # line with this special situation. =PRINTS: $VAR1 = [ 'abc', 'xyz' ]; =cut

      I, too, must commend your efforts to post working code examples. Long may they multiply and be fruitful! (It would have been better, however, if most of the examples given had not simply repeated examples previously given by others. The one example that does not repeat a previous example in the thread does not seem to address any point previously raised in the thread.)

      ... these are Perl one-liners intended for the command-line, so $ characters are escaped for the shell ...

      The examples you give appear to be for a *nix shell (bash?). Another way, and by far the most common for PerlMonks example code postings (or so it seems to me), is to use a non-interpolating shell quote character; see here and here for what seem to be *nix examples. Using non-interpolating quotation would seem to be a much simpler, hence preferable, way to present such Perl code.


      Give a man a fish:  <%-{-{-{-<

    A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1218126]
Approved by beech
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (9)
As of 2024-04-19 07:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found