Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Re: Need a better way to count input lines

by hv (Parson)
on May 07, 2004 at 16:15 UTC ( #351509=note: print w/replies, xml ) Need Help??

in reply to Need a better way to count input lines

For Bo Jo Le Much, the problem occurs when you try to clean up $fname:

($lname, $fname) = split(","); $fname =~ s/^\s+(.+?)\W*$/$1/;

The /\s+/fragment requires one or more whitespace characters at the start, and when that isn't present the substitution fails.

The cleanest fix would probably be to absorb optional whitespace after the comma within the split, so that you don't need to look for it afterwards:

($lname, $fname) = split /,\s*/, $_, 2; $fname =~ s/\W+$//;

Note that I've extended the split command slightly: specifying a LIMIT of 2 ensures that trailing text will not be thrown away if for some reason there is more than one comma in the name. It also allows the split to execute slightly faster, since it doesn't need to scan for more commas after the first is found.

Also, removing the need to check for whitespace in the substitution means that the only thing it is still doing is stripping non-word characters from the end, allowing a somewhat simpler pattern.


Replies are listed 'Best First'.
Re: Re: Need a better way to count input lines
by Theo (Priest) on May 07, 2004 at 17:38 UTC
    Ahh! The lights come on! (Then they go dim again) I changed the
    $fname =~ s/^\s+(.+?)\W*$/$1/;
    $fname =~ s/^\s*(.+?)\W*$/$1/;
    and the problem went away. (Of course it did. I should have seen that.) I can see why with the '+' it didn't work right, but why would the failure manifest as a CR at the end of $fname?
    But I am perplexed about the split itself. I replaced the (",") with (",\s*") and got the error:
    Unrecognized escape \s passed through at line 28.
    The paren/quote format worked fine with just a comma but broke with the addition of the space char class. Did I just get lucky the first way?

    As to the LIMIT option, my Llama book doesn't explain it. Does the 2 relate the expectation of having two char within the match?

    (so many nodes and so little time ... )

      ... but why would the failure manifest as a CR at the end of $fname?

      I suspect the text, created under Windows, has CRLF as the line terminator; running the script under Solaris the chomp is removing the LF but not the CR, so the CR was getting removed instead by the $fname substitution - but only when the substitution actually happened.

      As for split, please take a look through perldoc -f split. First, note that the first parameter is shown as /PATTERN/; if you supply this parameter as a pattern (eg /,\s*/) it gets parsed as a pattern and does the right thing. If you supply a quoted string instead (eg ",\s*") perl will treat that first as a quoted string - in which case "\s" gives the warning, and gets converted to "s" - and the resulting string is then converted to a regexp further down the line.

      You'll also find there far more than you ever wanted to know about the LIMIT argument.


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://351509]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (8)
As of 2020-07-10 22:38 GMT
Find Nodes?
    Voting Booth?

    No recent polls found