Re: How do I remove whitespace at the beginning or end of my string?

There are some faster solutions which sometimes can be really slow, depending on how many whitespaces a string contain.

If a string contains a lot of whitespaces.
Example: my $str = q{ }. q{a b c d e f g h i j} x 200 . q{ };

MRE book suggests this code:
$str =~ s/^\s+((?:.+\S)?)\s+$/$1/s;

I admit, I was surprised how fast it is compared with: "s/^\s+//" and his brother "s/\s+$//". They can't even compete at a benchmark, they are too slow with the above example! (that's because of the second regex which match at the end of the string, if fails so many times if string contains a lot of whitespaces (see re 'debug')).

Another approach (I know is silly, but is faster in some casses):

$str =~ s/^\s+//;
$str = reverse($str);
$str =~ s/^\s+//;
$str = reverse($str);
[download]

Benchmark using the above example:

's_reverse' 42017/s     --   -12%   -48%
'unpack_A'  47847/s    14%     --   -41%
'MRE_regx'  80645/s    92%    69%     --
[download]

Comment on Re: How do I remove whitespace at the beginning or end of my string? Select or Download Code

Replies are listed 'Best First'.
Re: Answer: How do I remove whitespace at the beginning or end of my string? by repellent (Priest) on Jan 30, 2012 at 00:08 UTC
`MRE_regx` does not trim whitespace as expected: $ perl -de 1 Loading DB routines from perl5db.pl version 1.3 Editor support available. Enter h or `h h' for help, or `man perldebug' for more help. main::(-e:1): 1 DB<1> $str = ' x '; $str =~ s/^\s+((?:.+\S)?)\s+$/$1/s; DB<2> x $str 0 ' x' [download]	[reply] [d/l] [select]
Re^2: Answer: How do I remove whitespace at the beginning or end of my string? by choroba (Cardinal) on Jan 30, 2012 at 09:04 UTC
Change it to `s/^\s+(\S?.*\S)\s+$/$1/s` [download]	[reply] [d/l]
Re^3: Answer: How do I remove whitespace at the beginning or end of my string? by repellent (Priest) on Jan 31, 2012 at 05:01 UTC
use Test::More; sub trim { my $s = $_[0]; $s =~ s/^\s+(\S?.\S)\s+$/$1/s; $s } is( trim(' '), '' ); is( trim('a '), 'a' ); is( trim(' a'), 'a' ); is( trim(' a '), 'a' ); is( trim('ab '), 'ab' ); is( trim(' ab'), 'ab' ); is( trim(' ab '), 'ab' ); is( trim('a bb c '), 'a bb c' ); is( trim(' a bb c'), 'a bb c' ); is( trim(' a bb c '), 'a bb c' ); done_testing(); __END__ not ok 1 # Failed test at ./t.pl line 12. # got: ' ' # expected: '' not ok 2 # Failed test at ./t.pl line 13. # got: 'a ' # expected: 'a' not ok 3 # Failed test at ./t.pl line 14. # got: ' a' # expected: 'a' ok 4 not ok 5 # Failed test at ./t.pl line 16. # got: 'ab ' # expected: 'ab' not ok 6 # Failed test at ./t.pl line 17. # got: ' ab' # expected: 'ab' ok 7 not ok 8 # Failed test at ./t.pl line 19. # got: 'a bb c ' # expected: 'a bb c' not ok 9 # Failed test at ./t.pl line 20. # got: ' a bb c' # expected: 'a bb c' ok 10 1..10 # Looks like you failed 7 tests of 10. [download] The one I could find with best benchmark and passes tests is `s/^\s((?:.\S)?)\s$/$1/s;`, which is essentially like `MRE_regx` with `+` replaced with `*` (perhaps trizen typo-ed?)	[reply] [d/l] [select]


Clear questions and runnable code get the best and fastest answer
	PerlMonks