Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Avoiding repeated undefs

by Eily (Monsignor)
on Feb 26, 2019 at 09:55 UTC ( [id://1230552]=note: print w/replies, xml ) Need Help??


in reply to [Solved] Avoiding repeated undefs

my ($key, $val) = (split(" ", $_))[0,6]; would work, although you might just as well write:

my @values = split(" ", $_); my ($key, $value) = @values[0,6];
I find the second one more elegant personally

Edit: this is not an exact equivalent though, as davies demonstrated below

Replies are listed 'Best First'.
Re^2: Avoiding repeated undefs
by davies (Prior) on Feb 26, 2019 at 14:07 UTC

    Trying the first one, which I thought extremely elegant, I ran into some unexpected behaviour.

    dr@dns:~$ cat sscce.pl use strict; use warnings; use feature 'say'; my $str = 'a b c d e f '; my ($key, undef, undef, undef, undef, undef, $val) = split(" ", $str); say "Val = <$val>"; my ($key2, $val2) = (split(" ", $str))[0,6]; say "Val2 = <$val2>"; dr@dns:~$ perl sscce.pl Val = <> Use of uninitialized value $val2 in concatenation (.) or string at ssc +ce.pl line 9. Val2 = <>

    Some of the data I am hacking have only six data points, but with a trailing space. In my original version I was getting a zero length string as the last value as shown in the SSCCE above. But using the more elegant version, it becomes undef and I get the warning shown. Actually, I find it more logical to get undef and changing my code would be no problem, but I can't find any indication why split should behave so apparently inconsistently. I have tried using the regex in my OP, but get the same results as in the SSCCE.

    As I said in my OP, I have working code. This isn't important, but it is a gap in my knowledge that I'd like to close.

    Regards,

    John Davies

      Yes, split isn't exactly consistent there. There is a third parameter - LIMIT - which limits the number of time the string is splitted. Eg: split '_', 'a_b_c_d', 2 will actually return the list ('a', 'b_c_d') because it has been splitted in two.

      The thing is if LIMIT is 0 or omitted, all empty fields at the end are removed. So split '_', 'a_b___', 0; will return the list ('a', 'b'). Which is why you get an undefined value in your second case.

      Now, the tricky bit is, for optimization, when perl knows how many values you are trying to write to, it will actually set LIMIT to the number of element +1 (split to each element, and ignore the reminder). So

      my ($key, @array[0,1], undef, $value) = split " ", $string;
      is actually interpreted as
      my ($key, @array[0,1], undef, $value) = split " ", $string, 6; # Split + five times, ignore the sixth value
      In that case, LIMIT is not 0 so the empty fields at the end are not removed.

      You could write ($key, $value) = split(" ", $string, 8)[0,6]; (there are seven values from 0 to 6, so the reminder is the 8th), but the 8 seems to come out of nowhere, and this just calls for a mistake. Luckily, if LIMIT is negative, it will be treated as an infinite value, ie split will continue splitting until the end of the input, and won't remove empty fields at the end:

      my ($key, $value) = (split(" ", $string, -1))[0,6];
      Do note that split " " is a special case of split which is the same as split /\s+/ except empty fields at the start are removed.

      Edit: for what it's worth, hippo's solution doesn't have that problem.

        The split behavior here really looks like a bug. Limit "inheritance" is not documented, so it is a surprise for user.

        UPD.: Thanks, choroba, somehow I missed that line :( . Now it seems normal behaviour. But in case '(split ... )[ ... ]' it doesn't look like DWIM, when split generates too few elements (which cant't be accessible by higher indexes which were used).

      Could you explain where/how do you think split() is inconsistent? . . . Wait, this is where ...

      In my original version I was getting a zero length string as the last value as shown in the SSCCE above. But using the more elegant version, it becomes undef and I get the warning shown.

      ... in which case, sorry to bother you. I see that in perl 5.24.0.

      So in your own version, split() gets the number of fields (limit) to generate; in Eily's version, split() behaves as expected in that (just see the B::Deparse output) ...

      # Inserted newlines for legibility of one-liner run under Windows. perl -MO=Deparse -e " use strict; use warnings; my $x = q[a b c d e f ]; my ( $one , undef , undef, undef , undef, undef , $other ) = split q +[ ] , $x; my @all = split q[ ] , $x; " use warnings; use strict; my $x = 'a b c d e f '; my($one, undef, undef, undef, undef, undef, $other) = split(' ', $x, 8 +); my @all = split(' ', $x, 0); -e syntax OK

      At least in perl 5.24.0, supplying 0 or -1 as the limit to split() in the OP's version has no effect on the outcome, i.e. $other gets the value of empty string not undef. $other becomes undef only if one sets the limit to 6.

      Then force the issue by splitting in list context: my ( $x , ... , $y ) = () = split( ... ).

      After all that, I am unable to answer your question: Why?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1230552]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (5)
As of 2024-04-19 02:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found