Non-regex Methods of Extracting the Last Field from a Record

ozboomer has asked for the wisdom of the Perl Monks concerning the following question:

I have a text data record that comprises a variable number of 'fields'... and I would like to extract the last field from the record.

Now, I realize I could probably use a (fancy) regex... and I understand they are (often) efficient and so on... but, like a lot of people, I get intimidated by regex... so I was looking for a simpler, array-manipulation method of extracting the last item from a variable length array.

Some example code:

#  $rec = " 2425 S  25  \"   6  47!  86  18  21!  87  23  23!   -     
+  -!   -       -!   96";

use Data::Dumper;

$infile = "2425-pmpk.txt";

open(INFILE, $infile) || die("cant on $infile\n$!\n");
while ($rec = <INFILE>) {
   chomp($rec);
   next unless ($rec =~ /S  25/);
   
   $rec =~ s/^\s+//;  # remove lead/trail spaces
   $rec =~ s/\s+$//;
   
# DOESN'T WORK
   (@junk, $ads) = (split(/\s+/, $rec));

# WORKS OK
#   (@junk) = (split(/\s+/, $rec));
#   $ads = @junk[-1];

# WORKS OK
#   (@junk) = (split(/\s+/, $rec));
#   $idx = $#junk;
#   $ads = $junk[$idx];

# WORKS OK ... BUT AT THE WRONG END OF THE ARRAY
#   ($ads, @junk) = (split(/\s+/, $rec));

print Dumper(@junk);

   printf("$rec\n");
   printf("%d\n", $ads);
}
close(INFILE);
[download]

There are a number of solutions there that I've worked-out... so I've more-or-less come up with a suitable solution... but I guess the primary question is why doesn't the following construct work:

(@junk, $ads) = (split(/\s+/, $rec));

I've looked through the perlfaq, the Camel and Ram books... and have searched through the HallowedHalls(tm) here but couldn't find anything specific..

Would appreciate any thoughts...

Comment on Non-regex Methods of Extracting the Last Field from a Record Download Code

Replies are listed 'Best First'.
Re: Non-regex Methods of Extracting the Last Field from a Record by toolic (Bishop) on May 08, 2009 at 00:13 UTC
the primary question is why doesn't the following construct work: (@junk, $ads) = (split(/\s+/, $rec)); Think of the `@junk` array as being "greedy": it swallows up all of the list elements returned by split, leaving none for your `$ads` scalar. Update: yet another way, without the temporary `@junk` array: `use strict; use warnings; my $rec = 'a b c d 96'; my $ads = (split /\s+/, $rec)[-1]; print "ads=$ads\n"; __END__ ads=96` [download]	[reply] [d/l] [select]
Re: Non-regex Methods of Extracting the Last Field from a Record by AnomalousMonk (Archbishop) on May 08, 2009 at 00:20 UTC
How about: `>perl -wMstrict -le "my $str = 'a b c d e 96'; my $last_field = (split / /, $str)[-1]; print qq{'$last_field'}; " '96'` [download] ... the primary question is why doesn't the following construct work: `(@junk, $ads) = (split(/\s+/, $rec));` Because the `@junk` array 'consumes' all elements of the list produced by the split built-in, leaving nothing but undefinedness for the `$ads` scalar.	[reply] [d/l] [select]
Re: Non-regex Methods of Extracting the Last Field from a Record by ELISHEVA (Prior) on May 08, 2009 at 04:07 UTC
Even if you were more comfortable with regular expressions, split would have been a good choice for this problem. However, getting comfortable with regular expressions will open up a world of programming possibilities to you. It is worth investing the effort, even if it feels intimidating at first. One way to build confidence is to see problems you already understand well reworked into regular expressions, so here it is: `my ($lastWord) = $rec =~ /(\S+)$/; print "$lastWord\n"; #prints 96 #or $rec =~ /(\S+)$/; my $lastWord = $1; print "$lastWord\n"; #also prints 96` [download] The final '$' means whatever we match has to end with the end of the string. `\S` stands for not a space, i.e. anything except characters that would match `\s`. `\S+` means match as many as non-spaces as you can, that is, all the way back to (but not including) the first space. `(\S+)` means capture this portion of the match and put it in a variable. Perl automatically puts each captured match portion in numbered variables: the first captured match in `$1`, the second in `$2`. In this case we had only one captured match, so `$1` contains the match. Had you had two matches, like this $rec =~ (\S+)\s+(\S+)$ then the regex would have set `$1` to "-!" and `$2` to "96". `$1` and `$2` are very transient variables. They will change the next time you do a regular expression match or substitution that has a captured portion, so it is usually a good idea to copy them into one of your own declared variables as soon as possible. One easy way to do this is to assign the regex matching expression (`$rec =~ /(\S+)$/`) to an array or list, e.g. `my ($lastWord) = $rec =~ /(\S+)$/`. And we even could have combined your trailing space tripping code with the regular expression as well. This also would have worked: `my ($lastWord) = $rec =~ /(\S+)\s$/`. `\s` is like `\s+` except that it matches zero or more spaces, rather than one or more. That is `\s+` is equivalent to `\s\s*`. For more information and some further examples, see perlretut. Best, beth	[reply] [d/l] [select]
Re: Non-regex Methods of Extracting the Last Field from a Record by blokhead (Monsignor) on May 08, 2009 at 01:35 UTC
You can even do it avoiding the temporary list generated by `split`. Instead, you can use `rindex` to go directly to the appropriate part of the string: `my $rec = " 2425 S 25 \" 6 47! 86 18 21! 87 23 23! - + -! - -! 96"; my $last = substr $rec, 1 + rindex($rec, " "); print $last, $/;` [download] I'm only posting this for the sake of completeness. This only works because your record separator is whitespace. `rindex` won't generalize as well as `split` (more complicated separators, picking out records other than the last, ignoring trailing empty records). blokhead	[reply] [d/l]
Re: Non-regex Methods of Extracting the Last Field from a Record by codeacrobat (Chaplain) on May 08, 2009 at 06:08 UTC
Your @junk already consumes all splitted elements. There is no boundary on @junk that would prevent it to do otherwise. `(@junk, $ads) = (split(/\s+/, $rec)); # the same here (@foo, $bar) = @baz; # only this way it works ($first,$second,@remaining) = @all` [download] P.S. If you are using split then you are already using regular expressions. `print+qq(\L@{[ref\&@]}@{['@'x7^'!#2/"!4']});`	[reply] [d/l] [select]
Re: Non-regex Methods of Extracting the Last Field from a Record by allolex (Curate) on May 08, 2009 at 17:12 UTC
so I was looking for a simpler, array-manipulation method of extracting the last item from a variable length array. This made me think of pop() as yet another approach to getting the right field from the array. `my $rec = 'a b c d e f 96'; my @a = split /\s+/, $rec; printf "ads = %s\n", pop @a;` [download] -- Damon Allen Davison http://www.allolex.net	[reply] [d/l]


Problems? Is your data what you think it is?
	PerlMonks