Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re^2: Avoiding repeated undefs

by davies (Prior)
on Feb 26, 2019 at 14:07 UTC ( [id://1230565]=note: print w/replies, xml ) Need Help??


in reply to Re: Avoiding repeated undefs
in thread [Solved] Avoiding repeated undefs

Trying the first one, which I thought extremely elegant, I ran into some unexpected behaviour.

dr@dns:~$ cat sscce.pl use strict; use warnings; use feature 'say'; my $str = 'a b c d e f '; my ($key, undef, undef, undef, undef, undef, $val) = split(" ", $str); say "Val = <$val>"; my ($key2, $val2) = (split(" ", $str))[0,6]; say "Val2 = <$val2>"; dr@dns:~$ perl sscce.pl Val = <> Use of uninitialized value $val2 in concatenation (.) or string at ssc +ce.pl line 9. Val2 = <>

Some of the data I am hacking have only six data points, but with a trailing space. In my original version I was getting a zero length string as the last value as shown in the SSCCE above. But using the more elegant version, it becomes undef and I get the warning shown. Actually, I find it more logical to get undef and changing my code would be no problem, but I can't find any indication why split should behave so apparently inconsistently. I have tried using the regex in my OP, but get the same results as in the SSCCE.

As I said in my OP, I have working code. This isn't important, but it is a gap in my knowledge that I'd like to close.

Regards,

John Davies

Replies are listed 'Best First'.
Re^3: Avoiding repeated undefs
by Eily (Monsignor) on Feb 26, 2019 at 14:57 UTC

    Yes, split isn't exactly consistent there. There is a third parameter - LIMIT - which limits the number of time the string is splitted. Eg: split '_', 'a_b_c_d', 2 will actually return the list ('a', 'b_c_d') because it has been splitted in two.

    The thing is if LIMIT is 0 or omitted, all empty fields at the end are removed. So split '_', 'a_b___', 0; will return the list ('a', 'b'). Which is why you get an undefined value in your second case.

    Now, the tricky bit is, for optimization, when perl knows how many values you are trying to write to, it will actually set LIMIT to the number of element +1 (split to each element, and ignore the reminder). So

    my ($key, @array[0,1], undef, $value) = split " ", $string;
    is actually interpreted as
    my ($key, @array[0,1], undef, $value) = split " ", $string, 6; # Split + five times, ignore the sixth value
    In that case, LIMIT is not 0 so the empty fields at the end are not removed.

    You could write ($key, $value) = split(" ", $string, 8)[0,6]; (there are seven values from 0 to 6, so the reminder is the 8th), but the 8 seems to come out of nowhere, and this just calls for a mistake. Luckily, if LIMIT is negative, it will be treated as an infinite value, ie split will continue splitting until the end of the input, and won't remove empty fields at the end:

    my ($key, $value) = (split(" ", $string, -1))[0,6];
    Do note that split " " is a special case of split which is the same as split /\s+/ except empty fields at the start are removed.

    Edit: for what it's worth, hippo's solution doesn't have that problem.

      The split behavior here really looks like a bug. Limit "inheritance" is not documented, so it is a surprise for user.

      UPD.: Thanks, choroba, somehow I missed that line :( . Now it seems normal behaviour. But in case '(split ... )[ ... ]' it doesn't look like DWIM, when split generates too few elements (which cant't be accessible by higher indexes which were used).
        If what you mean by "inheritance" is

        > when assigning to a list, if LIMIT is omitted (or zero), then LIMIT is treated as though it were one larger than the number of variables in the list;

        then note that the quote was taken directly from the documentation of split.

        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
Re^3: Avoiding repeated undefs
by parv (Parson) on Feb 26, 2019 at 14:17 UTC

    Could you explain where/how do you think split() is inconsistent? . . . Wait, this is where ...

    In my original version I was getting a zero length string as the last value as shown in the SSCCE above. But using the more elegant version, it becomes undef and I get the warning shown.

    ... in which case, sorry to bother you. I see that in perl 5.24.0.

    So in your own version, split() gets the number of fields (limit) to generate; in Eily's version, split() behaves as expected in that (just see the B::Deparse output) ...

    # Inserted newlines for legibility of one-liner run under Windows. perl -MO=Deparse -e " use strict; use warnings; my $x = q[a b c d e f ]; my ( $one , undef , undef, undef , undef, undef , $other ) = split q +[ ] , $x; my @all = split q[ ] , $x; " use warnings; use strict; my $x = 'a b c d e f '; my($one, undef, undef, undef, undef, undef, $other) = split(' ', $x, 8 +); my @all = split(' ', $x, 0); -e syntax OK

    At least in perl 5.24.0, supplying 0 or -1 as the limit to split() in the OP's version has no effect on the outcome, i.e. $other gets the value of empty string not undef. $other becomes undef only if one sets the limit to 6.

    Then force the issue by splitting in list context: my ( $x , ... , $y ) = () = split( ... ).

    After all that, I am unable to answer your question: Why?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1230565]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (1)
As of 2024-04-24 13:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found