Re: Trimming whitespaces methods
by prasadbabu (Prior) on Jun 30, 2008 at 12:41 UTC
|
Hi harishnuti,
Here is one way, you can use String::Util module to trim leading and trailing spaces in neat way. You do similarly for array elements. TIMTOWTDI
use String::Util ':all';
$val = ' abc ';
# "crunch" whitespace and remove leading/trailing whitespace
$val = crunch($val);
# remove leading/trailing whitespace
$val = trim($val);
print ">$val<";
| [reply] [d/l] |
|
| [reply] [d/l] |
Re: Trimming whitespaces methods
by shmem (Chancellor) on Jun 30, 2008 at 14:15 UTC
|
Instead of capturing and replacing all with the captured string, I'd just remove whitespace
at the beginning and end of the string:
s/^\s+|\s+$//g;
one way to trim keys and values of a hash:
%hash = map { $v = $hash{$_}; s/^\s+|\s+$//g for $v,$_; $_,$v } keys %
+hash;
# another way which doesn't copy the entire has, just the keys
for my $key (keys %hash) {
my $value = delete $hash{$key};
for ($key, $value) {
s/^\s+|\s+$//g;
}
$hash{$key} = $value;
}
--shmem
_($_=" "x(1<<5)."?\n".q·/)Oo. G°\ /
/\_¯/(q /
---------------------------- \__(m.====·.(_("always off the crowd"))."·
");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
| [reply] [d/l] [select] |
|
I think tinkering with keys should come with a health warning: "If you didn't have duplicate keys before are you really sure you won't after you've tinkered with them?".
It's an edge case for sure but you can't rule out having " keyone" and "keyone". If you suspect there is leading and trailing space on your keys (and hence the question) it becomes less of an edge case. You'll have lost data and not noticed.
It must be better to do your trimming (keys and values) while building the hash, not after (you can use exists to catch any resulting dupes). If that's not possible/feasable I'd go for creating a hash of arrays and go through and count the little blighters.
Did I mention I've been bitten so often by stray whitespace (it's everywhere!) that I've become a tad obsessive? :-)
| [reply] |
Re: Trimming whitespaces methods
by johngg (Canon) on Jun 30, 2008 at 14:33 UTC
|
My preference is to do the space trimming in two stages as it seems to be faster than either the capture or alternation methods.
use strict;
use warnings;
use Benchmark q{cmpthese};
my @arr = ( q{ fdsgehw fw wwfe w } ) x 5000;
cmpthese(
-5,
{
alternation => sub
{
my @new = @arr;
s{ ^\s* | \s*$ }{}gx for @new;
},
capture => sub
{
my @new = @arr;
s{ ^\s* (\S.*?) \s*$ }{$1}x for @new;
},
twoStage => sub
{
my @new = @arr;
s{ ^\s* }{}x for @new;
s{ \s*$ }{}x for @new;
},
},
);
The results.
Rate capture alternation twoStage
capture 8.96/s -- -26% -50%
alternation 12.2/s 36% -- -33%
twoStage 18.1/s 102% 48% --
I hope this is of interest. Cheers, JohnGG
Update: Fixed code indentation problems caused by TABs | [reply] [d/l] [select] |
|
$_ = " foo\nbar ";
s{ ^\s* (\S.*?) \s*$ }{$1}x;
print "<$_>";
__END__
< foo
bar >
I assume you added the \S in the pattern as an improvement, but it should perhaps be noted that it has the effect of leaving a line of only whitespaces untouched, whereas the other ways don't.
lodin | [reply] [d/l] [select] |
|
| [reply] |
Re: Trimming whitespaces methods
by jwkrahn (Abbot) on Jun 30, 2008 at 14:23 UTC
|
s/^\s+//, s/\s+$// for @array;
| [reply] [d/l] |
Re: Trimming whitespaces methods
by wfsp (Abbot) on Jun 30, 2008 at 14:54 UTC
|
Or take advantage of split's default behaviour
my @arr = (q{ one }, q{ two three });
for (@arr){
my $trimmed = join q{ }, split;
printf qq{*%s*\n}, $trimmed;
}
*one*
*two three*
| [reply] [d/l] [select] |
|
Your method seems to be the quickest so far :-)
Rate capture alternation twoStage twoStageComma
+ splitJoin
capture 8.93/s -- -27% -51% -52%
+ -66%
alternation 12.2/s 36% -- -33% -34%
+ -53%
twoStage 18.2/s 104% 49% -- -2%
+ -31%
twoStageComma 18.6/s 108% 52% 2% --
+ -29%
splitJoin 26.2/s 193% 115% 44% 41%
+ --
It is also compacting multiple spaces within the string, which is a side-effect you might not want. Cheers, JohnGG | [reply] [d/l] [select] |
|
Rate alternation* capture alternation+ twoStage* twoS
+tageComma* splitJoin twoStage+ twoStageComma+
alternation* 29.8/s -- -1% -18% -34%
+ -35% -44% -65% -66%
capture 30.2/s 1% -- -17% -33%
+ -34% -43% -64% -66%
alternation+ 36.4/s 22% 20% -- -20%
+ -21% -31% -57% -59%
twoStage* 45.3/s 52% 50% 24% --
+ -1% -15% -46% -48%
twoStageComma* 45.9/s 54% 52% 26% 1%
+ -- -13% -46% -48%
splitJoin 53.1/s 78% 76% 46% 17%
+ 16% -- -37% -40%
twoStage+ 84.6/s 184% 180% 132% 87%
+ 84% 59% -- -4%
twoStageComma+ 87.8/s 194% 191% 141% 94%
+ 91% 65% 4% --
| [reply] [d/l] [select] |
Re: Trimming whitespaces methods
by linuxer (Curate) on Jun 30, 2008 at 14:16 UTC
|
If I'd stick to the s/// method, I'd prefer:
# remove leading/trailing whitespaces from array elem. w/o capturing
s/ ^\s+ | \s+$ //gx for @array;
update:
shmem++ for being faster ;o)
| [reply] [d/l] |
Re: Trimming whitespaces methods
by waldner (Beadle) on Jun 30, 2008 at 13:42 UTC
|
For a hash, I'd do something like
for (@arr=%hash) {
# remove spaces in $_ ...
}
%hash=@arr;
| [reply] [d/l] |
Re: Trimming whitespaces methods
by Anonymous Monk on Jun 30, 2008 at 14:18 UTC
|
If you do not want to use the existing (and strongly recommended) utilities noted in other posts, I would use the expression
s{ \A \s+ | \s+ \z }{}xmsg for @array;
Note that $_ is implicitly bound to the regex and so there is no need to use the expression $_ =~ s///.
Furthermore, the use of \s* in the expression you originally posted means that a substitution will be done in every string, since every string has zero or more whitespace at its beginning and end.
To trim the values of a hash, use an expression like
s{ \A \s+ | \s+ \z }{}xmsg for values %hash;
It is not clear to me what you mean by trimming the 'keys' of a hash: altering a hash key (which is a string) creates a different key. | [reply] [d/l] [select] |
Re: Trimming whitespaces methods
by Narveson (Chaplain) on Jun 30, 2008 at 17:04 UTC
|
@array = map {
m{
\s* # skip leading whitespace
( # then capture
.* # the longest possible substring
\S # that ends with a visible character
)
}x
} @array;
This loses any array elements that contained nothing but whitespace. If you want to retain these elements as empty strings, insert |$ in your capture. | [reply] [d/l] [select] |