Trimming whitespaces methods

harishnuti has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Trimming whitespaces methods by prasadbabu (Prior) on Jun 30, 2008 at 12:41 UTC
Hi harishnuti, Here is one way, you can use String::Util module to trim leading and trailing spaces in neat way. You do similarly for array elements. TIMTOWTDI `use String::Util ':all'; $val = ' abc '; # "crunch" whitespace and remove leading/trailing whitespace $val = crunch($val); # remove leading/trailing whitespace $val = trim($val); print ">$val<";` [download] Prasad	[reply] [d/l]
Re^2: Trimming whitespaces methods by rovf (Priest) on Jul 01, 2008 at 10:10 UTC
Another CPAN module for trimming would be `Text::Trim`. -- Ronald Fischer <ynnor@mm.st>	[reply] [d/l]
Re: Trimming whitespaces methods by shmem (Chancellor) on Jun 30, 2008 at 14:15 UTC
Instead of capturing and replacing all with the captured string, I'd just remove whitespace at the beginning and end of the string: `s/^\s+\|\s+$//g;` [download] one way to trim keys and values of a hash: `%hash = map { $v = $hash{$_}; s/^\s+\|\s+$//g for $v,$_; $_,$v } keys % +hash; # another way which doesn't copy the entire has, just the keys for my $key (keys %hash) { my $value = delete $hash{$key}; for ($key, $value) { s/^\s+\|\s+$//g; } $hash{$key} = $value; }` [download] --shmem _($_=" "x(1<<5)."?\n".q·/)Oo. G°\ / /\_¯/(q / ---------------------------- \__(m.====·.(_("always off the crowd"))."· ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}	[reply] [d/l] [select]
Re^2: Trimming whitespaces methods by wfsp (Abbot) on Jun 30, 2008 at 14:41 UTC
I think tinkering with keys should come with a health warning: "If you didn't have duplicate keys before are you really sure you won't after you've tinkered with them?". It's an edge case for sure but you can't rule out having " keyone" and "keyone". If you suspect there is leading and trailing space on your keys (and hence the question) it becomes less of an edge case. You'll have lost data and not noticed. It must be better to do your trimming (keys and values) while building the hash, not after (you can use exists to catch any resulting dupes). If that's not possible/feasable I'd go for creating a hash of arrays and go through and count the little blighters. Did I mention I've been bitten so often by stray whitespace (it's everywhere!) that I've become a tad obsessive? :-)	[reply]
Re: Trimming whitespaces methods by johngg (Canon) on Jun 30, 2008 at 14:33 UTC
My preference is to do the space trimming in two stages as it seems to be faster than either the capture or alternation methods. `use strict; use warnings; use Benchmark q{cmpthese}; my @arr = ( q{ fdsgehw fw wwfe w } ) x 5000; cmpthese( -5, { alternation => sub { my @new = @arr; s{ ^\s* \| \s$ }{}gx for @new; }, capture => sub { my @new = @arr; s{ ^\s (\S.?) \s$ }{$1}x for @new; }, twoStage => sub { my @new = @arr; s{ ^\s* }{}x for @new; s{ \s$ }{}x for @new; }, }, );` [download] The results. `Rate capture alternation twoStage capture 8.96/s -- -26% -50% alternation 12.2/s 36% -- -33% twoStage 18.1/s 102% 48% --` [download] I hope this is of interest. Cheers, JohnGG Update:* Fixed code indentation problems caused by TABs	[reply] [d/l] [select]
Re^2: Trimming whitespaces methods by lodin (Hermit) on Jun 30, 2008 at 17:02 UTC
In order for the code to be truly equivalent the s modifier should be used on the substitution or a newline may break it. `$_ = " foo\nbar "; s{ ^\s* (\S.?) \s$ }{$1}x; print "<$_>"; __END__ < foo bar >` [download] I assume you added the `\S` in the pattern as an improvement, but it should perhaps be noted that it has the effect of leaving a line of only whitespaces untouched, whereas the other ways don't. lodin	[reply] [d/l] [select]
Re^3: Trimming whitespaces methods by johngg (Canon) on Jun 30, 2008 at 22:33 UTC
I assume you added the \S in the pattern as an improvement No, I think I must have put it in because I wasn't thinking straight :-( Well spotted! Cheers, JohnGG	[reply]
Re: Trimming whitespaces methods by jwkrahn (Abbot) on Jun 30, 2008 at 14:23 UTC
The best way is: `s/^\s+//, s/\s+$// for @array;` [download]	[reply] [d/l]
Re: Trimming whitespaces methods by wfsp (Abbot) on Jun 30, 2008 at 14:54 UTC
Or take advantage of split's default behaviour `my @arr = (q{ one }, q{ two three }); for (@arr){ my $trimmed = join q{ }, split; printf qq{%s\n}, $trimmed; }` [download] `one two three` [download]	[reply] [d/l] [select]
Re^2: Trimming whitespaces methods by johngg (Canon) on Jun 30, 2008 at 15:37 UTC
Your method seems to be the quickest so far :-) Read more... (1040 Bytes) `Rate capture alternation twoStage twoStageComma + splitJoin capture 8.93/s -- -27% -51% -52% + -66% alternation 12.2/s 36% -- -33% -34% + -53% twoStage 18.2/s 104% 49% -- -2% + -31% twoStageComma 18.6/s 108% 52% 2% -- + -29% splitJoin 26.2/s 193% 115% 44% 41% + --` [download] It is also compacting multiple spaces within the string, which is a side-effect you might not want. Cheers, JohnGG	[reply] [d/l] [select]
Re^3: Trimming whitespaces methods by jwkrahn (Abbot) on Jun 30, 2008 at 23:02 UTC
If you try the same benchmark with the `+` modifier instead of the *``** modifier you will see which is truely fastest: Read more... (1279 Bytes) Rate alternation* capture alternation+ twoStage* twoS +tageComma* splitJoin twoStage+ twoStageComma+ alternation* 29.8/s -- -1% -18% -34% + -35% -44% -65% -66% capture 30.2/s 1% -- -17% -33% + -34% -43% -64% -66% alternation+ 36.4/s 22% 20% -- -20% + -21% -31% -57% -59% twoStage* 45.3/s 52% 50% 24% -- + -1% -15% -46% -48% twoStageComma* 45.9/s 54% 52% 26% 1% + -- -13% -46% -48% splitJoin 53.1/s 78% 76% 46% 17% + 16% -- -37% -40% twoStage+ 84.6/s 184% 180% 132% 87% + 84% 59% -- -4% twoStageComma+ 87.8/s 194% 191% 141% 94% + 91% 65% 4% -- [download]	[reply] [d/l] [select]
Re: Trimming whitespaces methods by linuxer (Curate) on Jun 30, 2008 at 14:16 UTC
If I'd stick to the s/// method, I'd prefer: `# remove leading/trailing whitespaces from array elem. w/o capturing s/ ^\s+ \| \s+$ //gx for @array;` [download] update: shmem++ for being faster ;o)	[reply] [d/l]
Re: Trimming whitespaces methods by waldner (Beadle) on Jun 30, 2008 at 13:42 UTC
For a hash, I'd do something like `for (@arr=%hash) { # remove spaces in $_ ... } %hash=@arr;` [download]	[reply] [d/l]
Re: Trimming whitespaces methods by Anonymous Monk on Jun 30, 2008 at 14:18 UTC
If you do not want to use the existing (and strongly recommended) utilities noted in other posts, I would use the expression `s{ \A \s+ \| \s+ \z }{}xmsg for @array;` Note that `$_` is implicitly bound to the regex and so there is no need to use the expression `$_ =~ s///`. Furthermore, the use of `\s` in the expression you originally posted means that a substitution will be done in every* string, since every string has zero or more whitespace at its beginning and end. To trim the `values` of a hash, use an expression like `s{ \A \s+ \| \s+ \z }{}xmsg for values %hash;` It is not clear to me what you mean by trimming the 'keys' of a hash: altering a hash key (which is a string) creates a different key.	[reply] [d/l] [select]
Re: Trimming whitespaces methods by Narveson (Chaplain) on Jun 30, 2008 at 17:04 UTC
Use `map` with captures. `@array = map { m{ \s* # skip leading whitespace ( # then capture .* # the longest possible substring \S # that ends with a visible character ) }x } @array;` [download] This loses any array elements that contained nothing but whitespace. If you want to retain these elements as empty strings, insert `\|$` in your capture.	[reply] [d/l] [select]


Perl-Sensitive Sunglasses
	PerlMonks