Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses

Trimming whitespaces methods

by harishnuti (Beadle)
on Jun 30, 2008 at 12:30 UTC ( [id://694730] : perlquestion . print w/replies, xml ) Need Help??

harishnuti has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, i figured out 2 methods of trimming leading and trailing whitespaces of array elements in one go..
# Assume @array containing strings,elements etc $_ =~ s/^\s*(.*?)\s*$/$1/ for @array; # method one map { $_ =~ s/^\s*(.*?)\s*$/$1/} @array; # method two # Though i understand map is not meant for what i have done above, pls + tell me which is correct usage interms of perl standards # if any other one line methods of trimming would be good # Also how do i trim all Keys/value pairs of Hash

Replies are listed 'Best First'.
Re: Trimming whitespaces methods
by prasadbabu (Prior) on Jun 30, 2008 at 12:41 UTC

    Hi harishnuti,

    Here is one way, you can use String::Util module to trim leading and trailing spaces in neat way. You do similarly for array elements. TIMTOWTDI

    use String::Util ':all'; $val = ' abc '; # "crunch" whitespace and remove leading/trailing whitespace $val = crunch($val); # remove leading/trailing whitespace $val = trim($val); print ">$val<";


      Another CPAN module for trimming would be Text::Trim.
      Ronald Fischer <>
Re: Trimming whitespaces methods
by shmem (Chancellor) on Jun 30, 2008 at 14:15 UTC

    Instead of capturing and replacing all with the captured string, I'd just remove whitespace at the beginning and end of the string:


    one way to trim keys and values of a hash:

    %hash = map { $v = $hash{$_}; s/^\s+|\s+$//g for $v,$_; $_,$v } keys % +hash; # another way which doesn't copy the entire has, just the keys for my $key (keys %hash) { my $value = delete $hash{$key}; for ($key, $value) { s/^\s+|\s+$//g; } $hash{$key} = $value; }


    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
      I think tinkering with keys should come with a health warning: "If you didn't have duplicate keys before are you really sure you won't after you've tinkered with them?".

      It's an edge case for sure but you can't rule out having " keyone" and "keyone". If you suspect there is leading and trailing space on your keys (and hence the question) it becomes less of an edge case. You'll have lost data and not noticed.

      It must be better to do your trimming (keys and values) while building the hash, not after (you can use exists to catch any resulting dupes). If that's not possible/feasable I'd go for creating a hash of arrays and go through and count the little blighters.

      Did I mention I've been bitten so often by stray whitespace (it's everywhere!) that I've become a tad obsessive? :-)

Re: Trimming whitespaces methods
by johngg (Canon) on Jun 30, 2008 at 14:33 UTC
    My preference is to do the space trimming in two stages as it seems to be faster than either the capture or alternation methods.

    use strict; use warnings; use Benchmark q{cmpthese}; my @arr = ( q{ fdsgehw fw wwfe w } ) x 5000; cmpthese( -5, { alternation => sub { my @new = @arr; s{ ^\s* | \s*$ }{}gx for @new; }, capture => sub { my @new = @arr; s{ ^\s* (\S.*?) \s*$ }{$1}x for @new; }, twoStage => sub { my @new = @arr; s{ ^\s* }{}x for @new; s{ \s*$ }{}x for @new; }, }, );

    The results.

    Rate capture alternation twoStage capture 8.96/s -- -26% -50% alternation 12.2/s 36% -- -33% twoStage 18.1/s 102% 48% --

    I hope this is of interest.



    Update: Fixed code indentation problems caused by TABs

      In order for the code to be truly equivalent the s modifier should be used on the substitution or a newline may break it.

      $_ = " foo\nbar "; s{ ^\s* (\S.*?) \s*$ }{$1}x; print "<$_>"; __END__ < foo bar >
      I assume you added the \S in the pattern as an improvement, but it should perhaps be noted that it has the effect of leaving a line of only whitespaces untouched, whereas the other ways don't.


        I assume you added the \S in the pattern as an improvement

        No, I think I must have put it in because I wasn't thinking straight :-(

        Well spotted!



Re: Trimming whitespaces methods
by jwkrahn (Abbot) on Jun 30, 2008 at 14:23 UTC

    The best way is:

    s/^\s+//, s/\s+$// for @array;
Re: Trimming whitespaces methods
by wfsp (Abbot) on Jun 30, 2008 at 14:54 UTC
    Or take advantage of split's default behaviour
    my @arr = (q{ one }, q{ two three }); for (@arr){ my $trimmed = join q{ }, split; printf qq{*%s*\n}, $trimmed; }
    *one* *two three*
      Your method seems to be the quickest so far :-)

      Rate capture alternation twoStage twoStageComma + splitJoin capture 8.93/s -- -27% -51% -52% + -66% alternation 12.2/s 36% -- -33% -34% + -53% twoStage 18.2/s 104% 49% -- -2% + -31% twoStageComma 18.6/s 108% 52% 2% -- + -29% splitJoin 26.2/s 193% 115% 44% 41% + --

      It is also compacting multiple spaces within the string, which is a side-effect you might not want.



        If you try the same benchmark with the + modifier instead of the * modifier you will see which is truely fastest:

        Rate alternation* capture alternation+ twoStage* twoS +tageComma* splitJoin twoStage+ twoStageComma+ alternation* 29.8/s -- -1% -18% -34% + -35% -44% -65% -66% capture 30.2/s 1% -- -17% -33% + -34% -43% -64% -66% alternation+ 36.4/s 22% 20% -- -20% + -21% -31% -57% -59% twoStage* 45.3/s 52% 50% 24% -- + -1% -15% -46% -48% twoStageComma* 45.9/s 54% 52% 26% 1% + -- -13% -46% -48% splitJoin 53.1/s 78% 76% 46% 17% + 16% -- -37% -40% twoStage+ 84.6/s 184% 180% 132% 87% + 84% 59% -- -4% twoStageComma+ 87.8/s 194% 191% 141% 94% + 91% 65% 4% --
Re: Trimming whitespaces methods
by linuxer (Curate) on Jun 30, 2008 at 14:16 UTC

    If I'd stick to the s/// method, I'd prefer:

    # remove leading/trailing whitespaces from array elem. w/o capturing s/ ^\s+ | \s+$ //gx for @array;

    shmem++ for being faster ;o)

Re: Trimming whitespaces methods
by waldner (Beadle) on Jun 30, 2008 at 13:42 UTC
    For a hash, I'd do something like
    for (@arr=%hash) { # remove spaces in $_ ... } %hash=@arr;
Re: Trimming whitespaces methods
by Anonymous Monk on Jun 30, 2008 at 14:18 UTC
    If you do not want to use the existing (and strongly recommended) utilities noted in other posts, I would use the expression

       s{ \A \s+ | \s+ \z }{}xmsg for @array;

    Note that $_ is implicitly bound to the regex and so there is no need to use the expression $_ =~ s///.
    Furthermore, the use of \s* in the expression you originally posted means that a substitution will be done in every string, since every string has zero or more whitespace at its beginning and end.

    To trim the values of a hash, use an expression like

       s{ \A \s+ | \s+ \z }{}xmsg for values %hash;

    It is not clear to me what you mean by trimming the 'keys' of a hash: altering a hash key (which is a string) creates a different key.

Re: Trimming whitespaces methods
by Narveson (Chaplain) on Jun 30, 2008 at 17:04 UTC

    Use map with captures.

    @array = map { m{ \s* # skip leading whitespace ( # then capture .* # the longest possible substring \S # that ends with a visible character ) }x } @array;

    This loses any array elements that contained nothing but whitespace. If you want to retain these elements as empty strings, insert |$ in your capture.