Re: Avoiding repeated undefs

Replies are listed 'Best First'.
Re^2: Avoiding repeated undefs by davies (Prior) on Feb 26, 2019 at 14:07 UTC
Trying the first one, which I thought extremely elegant, I ran into some unexpected behaviour. `dr@dns:~$ cat sscce.pl use strict; use warnings; use feature 'say'; my $str = 'a b c d e f '; my ($key, undef, undef, undef, undef, undef, $val) = split(" ", $str); say "Val = <$val>"; my ($key2, $val2) = (split(" ", $str))[0,6]; say "Val2 = <$val2>"; dr@dns:~$ perl sscce.pl Val = <> Use of uninitialized value $val2 in concatenation (.) or string at ssc +ce.pl line 9. Val2 = <>` [download] Some of the data I am hacking have only six data points, but with a trailing space. In my original version I was getting a zero length string as the last value as shown in the SSCCE above. But using the more elegant version, it becomes `undef` and I get the warning shown. Actually, I find it more logical to get `undef` and changing my code would be no problem, but I can't find any indication why `split` should behave so apparently inconsistently. I have tried using the regex in my OP, but get the same results as in the SSCCE. As I said in my OP, I have working code. This isn't important, but it is a gap in my knowledge that I'd like to close. Regards, John Davies	[reply] [d/l] [select]
Re^3: Avoiding repeated undefs by Eily (Monsignor) on Feb 26, 2019 at 14:57 UTC
Yes, split isn't exactly consistent there. There is a third parameter - LIMIT - which limits the number of time the string is splitted. Eg: `split '_', 'a_b_c_d', 2` will actually return the list ('a', 'b_c_d') because it has been splitted in two. The thing is if LIMIT is 0 or omitted, all empty fields at the end are removed. So `split '_', 'a_b___', 0;` will return the list ('a', 'b'). Which is why you get an undefined value in your second case. Now, the tricky bit is, for optimization, when perl knows how many values you are trying to write to, it will actually set LIMIT to the number of element +1 (split to each element, and ignore the reminder). So `my ($key, @array[0,1], undef, $value) = split " ", $string;` [download] is actually interpreted as `my ($key, @array[0,1], undef, $value) = split " ", $string, 6; # Split + five times, ignore the sixth value` [download] In that case, LIMIT is not 0 so the empty fields at the end are not removed. You could write `($key, $value) = split(" ", $string, 8)[0,6];` (there are seven values from 0 to 6, so the reminder is the 8th), but the 8 seems to come out of nowhere, and this just calls for a mistake. Luckily, if LIMIT is negative, it will be treated as an infinite value, ie split will continue splitting until the end of the input, and won't remove empty fields at the end: `my ($key, $value) = (split(" ", $string, -1))[0,6];` [download] Do note that `split " "` is a special case of split which is the same as `split /\s+/` except empty fields at the start are removed. Edit: for what it's worth, hippo's solution doesn't have that problem.	[reply] [d/l] [select]
Re^4: Avoiding repeated undefs by rsFalse (Chaplain) on Feb 27, 2019 at 22:41 UTC
The split behavior here really looks like a bug. Limit "inheritance" is not documented, so it is a surprise for user. UPD.: Thanks, choroba, somehow I missed that line :( . Now it seems normal behaviour. But in case '`(split ... )[ ... ]`' it doesn't look like DWIM, when split generates too few elements (which cant't be accessible by higher indexes which were used).	[reply] [d/l]
Re^5: Avoiding repeated undefs by choroba (Cardinal) on Feb 28, 2019 at 06:42 UTC
Re^3: Avoiding repeated undefs by parv (Parson) on Feb 26, 2019 at 14:17 UTC
~~Could you explain where/how do you think split() is inconsistent? . . . Wait, this is where ...~~ ~~In my original version I was getting a zero length string as the last value as shown in the SSCCE above. But using the more elegant version, it becomes undef and I get the warning shown.~~ ~~... in which case, sorry to bother you.~~ I see that in perl 5.24.0. So in your own version, split() gets the number of fields (limit) to generate; in Eily's version, split() behaves as expected in that (just see the B::Deparse output) ... `# Inserted newlines for legibility of one-liner run under Windows. perl -MO=Deparse -e " use strict; use warnings; my $x = q[a b c d e f ]; my ( $one , undef , undef, undef , undef, undef , $other ) = split q +[ ] , $x; my @all = split q[ ] , $x; " use warnings; use strict; my $x = 'a b c d e f '; my($one, undef, undef, undef, undef, undef, $other) = split(' ', $x, 8 +); my @all = split(' ', $x, 0); -e syntax OK` [download] At least in perl 5.24.0, supplying 0 or -1 as the limit to split() in the OP's version has no effect on the outcome, i.e. $other gets the value of empty string not undef. $other becomes undef only if one sets the limit to 6. Then force the issue by splitting in list context: `my ( $x , ... , $y ) = () = split( ... )`. After all that, I am unable to answer your question: Why?	[reply] [d/l] [select]


"be consistent"
	PerlMonks