[Solved] Avoiding repeated undefs

davies has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Avoiding repeated undefs by Eily (Monsignor) on Feb 26, 2019 at 09:55 UTC
`my ($key, $val) = (split(" ", $_))[0,6];` would work, although you might just as well write: `my @values = split(" ", $_); my ($key, $value) = @values[0,6];` [download] I find the second one more elegant personally Edit: this is not an exact equivalent though, as davies demonstrated below	[reply] [d/l] [select]
Re^2: Avoiding repeated undefs by davies (Prior) on Feb 26, 2019 at 14:07 UTC
Trying the first one, which I thought extremely elegant, I ran into some unexpected behaviour. `dr@dns:~$ cat sscce.pl use strict; use warnings; use feature 'say'; my $str = 'a b c d e f '; my ($key, undef, undef, undef, undef, undef, $val) = split(" ", $str); say "Val = <$val>"; my ($key2, $val2) = (split(" ", $str))[0,6]; say "Val2 = <$val2>"; dr@dns:~$ perl sscce.pl Val = <> Use of uninitialized value $val2 in concatenation (.) or string at ssc +ce.pl line 9. Val2 = <>` [download] Some of the data I am hacking have only six data points, but with a trailing space. In my original version I was getting a zero length string as the last value as shown in the SSCCE above. But using the more elegant version, it becomes `undef` and I get the warning shown. Actually, I find it more logical to get `undef` and changing my code would be no problem, but I can't find any indication why `split` should behave so apparently inconsistently. I have tried using the regex in my OP, but get the same results as in the SSCCE. As I said in my OP, I have working code. This isn't important, but it is a gap in my knowledge that I'd like to close. Regards, John Davies	[reply] [d/l] [select]
Re^3: Avoiding repeated undefs by Eily (Monsignor) on Feb 26, 2019 at 14:57 UTC
Yes, split isn't exactly consistent there. There is a third parameter - LIMIT - which limits the number of time the string is splitted. Eg: `split '_', 'a_b_c_d', 2` will actually return the list ('a', 'b_c_d') because it has been splitted in two. The thing is if LIMIT is 0 or omitted, all empty fields at the end are removed. So `split '_', 'a_b___', 0;` will return the list ('a', 'b'). Which is why you get an undefined value in your second case. Now, the tricky bit is, for optimization, when perl knows how many values you are trying to write to, it will actually set LIMIT to the number of element +1 (split to each element, and ignore the reminder). So `my ($key, @array[0,1], undef, $value) = split " ", $string;` [download] is actually interpreted as `my ($key, @array[0,1], undef, $value) = split " ", $string, 6; # Split + five times, ignore the sixth value` [download] In that case, LIMIT is not 0 so the empty fields at the end are not removed. You could write `($key, $value) = split(" ", $string, 8)[0,6];` (there are seven values from 0 to 6, so the reminder is the 8th), but the 8 seems to come out of nowhere, and this just calls for a mistake. Luckily, if LIMIT is negative, it will be treated as an infinite value, ie split will continue splitting until the end of the input, and won't remove empty fields at the end: `my ($key, $value) = (split(" ", $string, -1))[0,6];` [download] Do note that `split " "` is a special case of split which is the same as `split /\s+/` except empty fields at the start are removed. Edit: for what it's worth, hippo's solution doesn't have that problem.	[reply] [d/l] [select]
Re^4: Avoiding repeated undefs by rsFalse (Chaplain) on Feb 27, 2019 at 22:41 UTC
Re^5: Avoiding repeated undefs by choroba (Cardinal) on Feb 28, 2019 at 06:42 UTC
Re^3: Avoiding repeated undefs by parv (Parson) on Feb 26, 2019 at 14:17 UTC
~~Could you explain where/how do you think split() is inconsistent? . . . Wait, this is where ...~~ ~~In my original version I was getting a zero length string as the last value as shown in the SSCCE above. But using the more elegant version, it becomes undef and I get the warning shown.~~ ~~... in which case, sorry to bother you.~~ I see that in perl 5.24.0. So in your own version, split() gets the number of fields (limit) to generate; in Eily's version, split() behaves as expected in that (just see the B::Deparse output) ... `# Inserted newlines for legibility of one-liner run under Windows. perl -MO=Deparse -e " use strict; use warnings; my $x = q[a b c d e f ]; my ( $one , undef , undef, undef , undef, undef , $other ) = split q +[ ] , $x; my @all = split q[ ] , $x; " use warnings; use strict; my $x = 'a b c d e f '; my($one, undef, undef, undef, undef, undef, $other) = split(' ', $x, 8 +); my @all = split(' ', $x, 0); -e syntax OK` [download] At least in perl 5.24.0, supplying 0 or -1 as the limit to split() in the OP's version has no effect on the outcome, i.e. $other gets the value of empty string not undef. $other becomes undef only if one sets the limit to 6. Then force the issue by splitting in list context: `my ( $x , ... , $y ) = () = split( ... )`. After all that, I am unable to answer your question: Why?	[reply] [d/l] [select]
Re: Avoiding repeated undefs by Discipulus (Canon) on Feb 26, 2019 at 09:54 UTC
Hello davies, > It seems to me to be inelegant to repeat undef No, is not. Compiler does not bother with elegance ;) Anyway I suspect you cannot avoid them repeated: you are are in left side of an assignement, and inside a `my` declaration: no array (well you mean list?) can be there. PS An eventual list as second (or whatever..) element in the assignement will slurp everything `perl -e "$str = 'a b c d e f g';@arr=(1,2,3,4,5); ($key, @arr, $val) = + split(/\s+/, $str); print qq($key $val\n@arr\n)" a b c d e f g # even if the array is presized: perl -e "$str = 'a b c d e f g';$#arr=4; ($key, @arr, $val) = split(/\ +s+/, $str); print qq($key $val\n@arr\n)" a b c d e f g` [download] L* There are no rules, there are no thumbs.. Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.	[reply] [d/l]
Re^2: Avoiding repeated undefs by hippo (Bishop) on Feb 26, 2019 at 10:08 UTC
even if the array is presized If you specify the slice in the assignment it's fine though: `use strict; use warnings; use Test::More tests => 1; $_ = "Anyway I suspect you cannot avoid them repeated: you are are in +left side"; my @undef; my ($key, $val); ($key, @undef[0..4], $val) = split(/\s+/, $_); is ($val, 'them');` [download] I still prefer Eily's approach, however.	[reply] [d/l]
Re^3: Avoiding repeated undefs by rsFalse (Chaplain) on Feb 28, 2019 at 00:15 UTC
Similar approach: `( my $key, (undef) x 5, my $val) = split(/\s+/, $_);` [download] Note: 'undef' must be enclosed with parentheses to force list context for 'x' operator.	[reply] [d/l]
Re^4: Avoiding repeated undefs by choroba (Cardinal) on Feb 28, 2019 at 06:37 UTC
Re: [Solved] Avoiding repeated undefs by talexb (Chancellor) on Feb 27, 2019 at 00:40 UTC
It sounds like you already have a good solution, but for me, the obvious one is to pop the output from `split` into an array, then take the first and last values. `my @array = split(/\s+/, $_); my ( $key, $value ) = @array[ 0, -1 ];` [download] That approach doesn't care if the number of values changes -- you always get the first and the last values. Also, the split could be simplified to just `my @array = split(/\s+/);` [download] because `$_` is the default parameter for this function. Alex / talexb / Toronto Thanks PJ. We owe you so much. Groklaw -- RIP -- 2003 to 2013.	[reply] [d/l] [select]
Re: Avoiding repeated undefs by bliako (Monsignor) on Feb 26, 2019 at 15:07 UTC
Are equally inelegant entries accepted? `my $str = 'a b c d e f g'; my @two = $str =~/^([^\s]*)(?:(?:\s+[^\s]+){5})\s+([^\s]+)/; # or my ($key, $val) = $str =~ ... print join(",", @two)."\n";` [download]	[reply] [d/l]
Re: Avoiding repeated undefs by bliako (Monsignor) on Feb 26, 2019 at 15:15 UTC
That passes my elegance test (~~trust me~~ I drove a mercedes for years): `my $str = 'a b c d e f g'; my ($key, $val) = split(/(?:\s+[^\s]+)+\s+/, $str); print "key=$key, val=$val\n";` [download] bw, bliako	[reply] [d/l]


Think about Loose Coupling
	PerlMonks