Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Non-regex Methods of Extracting the Last Field from a Record

by ozboomer (Friar)
on May 07, 2009 at 23:55 UTC ( [id://762725]=perlquestion: print w/replies, xml ) Need Help??

ozboomer has asked for the wisdom of the Perl Monks concerning the following question:

I have a text data record that comprises a variable number of 'fields'... and I would like to extract the last field from the record.

Now, I realize I could probably use a (fancy) regex... and I understand they are (often) efficient and so on... but, like a lot of people, I get intimidated by regex... so I was looking for a simpler, array-manipulation method of extracting the last item from a variable length array.

Some example code:

# $rec = " 2425 S 25 \" 6 47! 86 18 21! 87 23 23! - + -! - -! 96"; use Data::Dumper; $infile = "2425-pmpk.txt"; open(INFILE, $infile) || die("cant on $infile\n$!\n"); while ($rec = <INFILE>) { chomp($rec); next unless ($rec =~ /S 25/); $rec =~ s/^\s+//; # remove lead/trail spaces $rec =~ s/\s+$//; # DOESN'T WORK (@junk, $ads) = (split(/\s+/, $rec)); # WORKS OK # (@junk) = (split(/\s+/, $rec)); # $ads = @junk[-1]; # WORKS OK # (@junk) = (split(/\s+/, $rec)); # $idx = $#junk; # $ads = $junk[$idx]; # WORKS OK ... BUT AT THE WRONG END OF THE ARRAY # ($ads, @junk) = (split(/\s+/, $rec)); print Dumper(@junk); printf("$rec\n"); printf("%d\n", $ads); } close(INFILE);
There are a number of solutions there that I've worked-out... so I've more-or-less come up with a suitable solution... but I guess the primary question is why doesn't the following construct work:

(@junk, $ads) = (split(/\s+/, $rec));

I've looked through the perlfaq, the Camel and Ram books... and have searched through the HallowedHalls(tm) here but couldn't find anything specific..

Would appreciate any thoughts...

Replies are listed 'Best First'.
Re: Non-regex Methods of Extracting the Last Field from a Record
by toolic (Bishop) on May 08, 2009 at 00:13 UTC
    the primary question is why doesn't the following construct work: (@junk, $ads) = (split(/\s+/, $rec));
    Think of the @junk array as being "greedy": it swallows up all of the list elements returned by split, leaving none for your $ads scalar.

    Update: yet another way, without the temporary @junk array:

    use strict; use warnings; my $rec = 'a b c d 96'; my $ads = (split /\s+/, $rec)[-1]; print "ads=$ads\n"; __END__ ads=96
Re: Non-regex Methods of Extracting the Last Field from a Record
by AnomalousMonk (Archbishop) on May 08, 2009 at 00:20 UTC
    How about:
    >perl -wMstrict -le "my $str = 'a b c d e 96'; my $last_field = (split / /, $str)[-1]; print qq{'$last_field'}; " '96'
    ... the primary question is why doesn't the following construct work:
     (@junk, $ads) = (split(/\s+/, $rec));
    Because the  @junk array 'consumes' all elements of the list produced by the split built-in, leaving nothing but undefinedness for the  $ads scalar.
Re: Non-regex Methods of Extracting the Last Field from a Record
by ELISHEVA (Prior) on May 08, 2009 at 04:07 UTC

    Even if you were more comfortable with regular expressions, split would have been a good choice for this problem.

    However, getting comfortable with regular expressions will open up a world of programming possibilities to you. It is worth investing the effort, even if it feels intimidating at first. One way to build confidence is to see problems you already understand well reworked into regular expressions, so here it is:

    my ($lastWord) = $rec =~ /(\S+)$/; print "$lastWord\n"; #prints 96 #or $rec =~ /(\S+)$/; my $lastWord = $1; print "$lastWord\n"; #also prints 96
    • The final '$' means whatever we match has to end with the end of the string.
    • \S stands for not a space, i.e. anything except characters that would match \s.
    • \S+ means match as many as non-spaces as you can, that is, all the way back to (but not including) the first space.
    • (\S+) means capture this portion of the match and put it in a variable. Perl automatically puts each captured match portion in numbered variables: the first captured match in $1, the second in $2. In this case we had only one captured match, so $1 contains the match. Had you had two matches, like this $rec =~ (\S+)\s+(\S+)$ then the regex would have set $1 to "-!" and $2 to "96".
    • $1 and $2 are very transient variables. They will change the next time you do a regular expression match or substitution that has a captured portion, so it is usually a good idea to copy them into one of your own declared variables as soon as possible. One easy way to do this is to assign the regex matching expression ($rec =~ /(\S+)$/) to an array or list, e.g. my ($lastWord) = $rec =~ /(\S+)$/.

    And we even could have combined your trailing space tripping code with the regular expression as well. This also would have worked: my ($lastWord) = $rec =~ /(\S+)\s*$/. \s* is like \s+ except that it matches zero or more spaces, rather than one or more. That is \s+ is equivalent to \s\s*.

    For more information and some further examples, see perlretut.

    Best, beth

Re: Non-regex Methods of Extracting the Last Field from a Record
by blokhead (Monsignor) on May 08, 2009 at 01:35 UTC
    You can even do it avoiding the temporary list generated by split. Instead, you can use rindex to go directly to the appropriate part of the string:
    my $rec = " 2425 S 25 \" 6 47! 86 18 21! 87 23 23! - + -! - -! 96"; my $last = substr $rec, 1 + rindex($rec, " "); print $last, $/;
    I'm only posting this for the sake of completeness. This only works because your record separator is whitespace. rindex won't generalize as well as split (more complicated separators, picking out records other than the last, ignoring trailing empty records).

    blokhead

Re: Non-regex Methods of Extracting the Last Field from a Record
by codeacrobat (Chaplain) on May 08, 2009 at 06:08 UTC
    Your @junk already consumes all splitted elements. There is no boundary on @junk that would prevent it to do otherwise.
    (@junk, $ads) = (split(/\s+/, $rec)); # the same here (@foo, $bar) = @baz; # only this way it works ($first,$second,@remaining) = @all
    P.S. If you are using split then you are already using regular expressions.

    print+qq(\L@{[ref\&@]}@{['@'x7^'!#2/"!4']});
Re: Non-regex Methods of Extracting the Last Field from a Record
by allolex (Curate) on May 08, 2009 at 17:12 UTC
    so I was looking for a simpler, array-manipulation method of extracting the last item from a variable length array.

    This made me think of pop() as yet another approach to getting the right field from the array.

    my $rec = 'a b c d e f 96'; my @a = split /\s+/, $rec; printf "ads = %s\n", pop @a;

    --
    Damon Allen Davison
    http://www.allolex.net

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://762725]
Approved by AnomalousMonk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2024-04-26 00:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found