Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Need a better way to count input lines

by hv (Prior)
on May 07, 2004 at 16:15 UTC ( [id://351509]=note: print w/replies, xml ) Need Help??


in reply to Need a better way to count input lines

For Bo Jo Le Much, the problem occurs when you try to clean up $fname:

($lname, $fname) = split(","); $fname =~ s/^\s+(.+?)\W*$/$1/;

The /\s+/fragment requires one or more whitespace characters at the start, and when that isn't present the substitution fails.

The cleanest fix would probably be to absorb optional whitespace after the comma within the split, so that you don't need to look for it afterwards:

($lname, $fname) = split /,\s*/, $_, 2; $fname =~ s/\W+$//;

Note that I've extended the split command slightly: specifying a LIMIT of 2 ensures that trailing text will not be thrown away if for some reason there is more than one comma in the name. It also allows the split to execute slightly faster, since it doesn't need to scan for more commas after the first is found.

Also, removing the need to check for whitespace in the substitution means that the only thing it is still doing is stripping non-word characters from the end, allowing a somewhat simpler pattern.

Hugo

Replies are listed 'Best First'.
Re: Re: Need a better way to count input lines
by Theo (Priest) on May 07, 2004 at 17:38 UTC
    Ahh! The lights come on! (Then they go dim again) I changed the
    $fname =~ s/^\s+(.+?)\W*$/$1/;
    to
    $fname =~ s/^\s*(.+?)\W*$/$1/;
    and the problem went away. (Of course it did. I should have seen that.) I can see why with the '+' it didn't work right, but why would the failure manifest as a CR at the end of $fname?
    But I am perplexed about the split itself. I replaced the (",") with (",\s*") and got the error:
    Unrecognized escape \s passed through at phonelist.pl line 28.
    The paren/quote format worked fine with just a comma but broke with the addition of the space char class. Did I just get lucky the first way?

    As to the LIMIT option, my Llama book doesn't explain it. Does the 2 relate the expectation of having two char within the match?

    -Theo-
    (so many nodes and so little time ... )

      ... but why would the failure manifest as a CR at the end of $fname?

      I suspect the text, created under Windows, has CRLF as the line terminator; running the script under Solaris the chomp is removing the LF but not the CR, so the CR was getting removed instead by the $fname substitution - but only when the substitution actually happened.

      As for split, please take a look through perldoc -f split. First, note that the first parameter is shown as /PATTERN/; if you supply this parameter as a pattern (eg /,\s*/) it gets parsed as a pattern and does the right thing. If you supply a quoted string instead (eg ",\s*") perl will treat that first as a quoted string - in which case "\s" gives the warning, and gets converted to "s" - and the resulting string is then converted to a regexp further down the line.

      You'll also find there far more than you ever wanted to know about the LIMIT argument.

      Hugo

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://351509]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (8)
As of 2024-03-29 13:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found