String manipulation

abhy has asked for the wisdom of the Perl Monks concerning the following question:

I have a code snippet which works as desired but it does not look elegant to me. Can someone please suggest a simpler and shorter way of doing it? I am not bothered about performance as this is a one time task on a relatively small data set.

Requirement: Original String => converted String

j k l foobar => jkl foobar

j k lm foobar => jk lm foobar

j k l => jkl

jk l foobar => jk l foobar

foobar => foobar

foobar j k l => foobar j k l Basically what I am trying to do is strings that have the format of alphabet followed by space (multiple occurrences of this pattern) followed by an optional string should be converted to the format where in the group of alphabets at the begining should be stringified. If the alphabet pattern group occurs at the end this should not happen. Here is my code snippet:

use strict;
use warnings;

my @str = ("j    k l         foobar", "foobar", "jkl foobar", "1 2 3",
+ "jk l foobar", "foobar j k l", "foobar j kl", "           ", " ", "j
+ jk foobar", "j k jk foobar", "j k l");

my @sanitisedNames = ();

for(@str) {
    $_ =~ s/\s+/ /g;

    if ($_ =~ /^\s$/) {
        next;
    }

    my $boundary = &sanitise($_);
    
    my $sanitisedName;
    if ($boundary == 0) {
        $sanitisedName = $_;
    } elsif ($boundary == length($_)) {
        $_ =~ s/\s+//g;
        $sanitisedName = $_;
    } else {
        my $firstPart = substr($_, 0, $boundary);
        $firstPart =~ s/\s+//g;
        my $secondPart = substr($_, $boundary);
        $sanitisedName = $firstPart.' '.$secondPart;
    }

    push(@sanitisedNames, $sanitisedName);
    
}

print $_, "\n" for (@sanitisedNames);

sub sanitise {
    my $str = shift;

    my @chars = split('', $str);

    my $count = 0;
    my $len = length($str);

    while ($count < $len) {
        if ($chars[$count++] ne ' ' && $count < $len && $chars[$count+
++] eq ' ') {
        } else {
            if ($count == $len) {
                return $len;
            }
            if ($count > 3) {
                $count = $count - 3;
                return $count;
            } else {
                return 0;
            }
        }
    }
}
[download]

Also can this be done in a single line with a regex? I could not come up with one. So I am coming to the abode of the monks for wisdom :).

Comment on String manipulation Download Code

Replies are listed 'Best First'.
Re: String manipulation by GrandFather (Saint) on Jun 24, 2009 at 10:59 UTC
Something like: `use strict; use warnings; my @strs = ( "j k l foobar", "foobar", "jkl foobar", "1 2 3", "jk l foobar", "foobar j k l", "foobar j kl", " ", " ", "j jk foobar", "j k jk foobar", "j k l" ); my @sanitisedNames = (); for my $str (@strs) { my $oStr = $str; $str =~ s/\G([a-z])\s+(?!\w{2})/$1/gi; $str =~ s/\s+/ /g; push @sanitisedNames, [$oStr, $str]; } printf "%-25s >%s<\n", @$_ for (@sanitisedNames);` [download] Prints: `j k l foobar >jkl foobar< foobar >foobar< jkl foobar >jkl foobar< 1 2 3 >1 2 3< jk l foobar >jk l foobar< foobar j k l >foobar j k l< foobar j kl >foobar j kl< > < > < j jk foobar >j jk foobar< j k jk foobar >jk jk foobar< j k l >jkl<` [download] True laziness is hard work	[reply] [d/l] [select]
Re: String manipulation by citromatik (Curate) on Jun 24, 2009 at 10:08 UTC
Something like this may do the job: `use strict; use warnings; use feature qw/ :5.10 /; my @str = ("j k l foobar", "foobar", "jkl foobar", "1 2 3", + "jk l foobar", "foobar j k l", "foobar j kl", " ", " ", "j + jk foobar", "j k jk foobar", "j k l"); my $rx = qr/^((?:\w(?:\s+\|\z)))(.)/; for my $str (@str){ my ($i,$s) = $str =~ /$rx/; $i =~ s/\s+//g; say $i ? join " ",$i,$s : $s; }` [download] Outputs: `jkl foobar foobar jkl foobar 123 jk l foobar foobar j k l foobar j kl j jk foobar jk jk foobar jkl` [download] citromatik	[reply] [d/l] [select]
Re^2: String manipulation by abhy (Novice) on Jun 24, 2009 at 10:42 UTC
If I am not asking for too much, can you please expand the regex with comments?	[reply]
Re^3: String manipulation by citromatik (Curate) on Jun 24, 2009 at 11:22 UTC
Sure: `my $rx = qr/ ^ # At the beginning of the string ( # Capture in $1 (?: # Paren for grouping (don't capture in $2) \w # match a "wordy" character (?: # paren for grouping, \s+\|\z # match at least one space or the end of the + string ) )* # zero or more times ) # end of capture $1 (.) # and match what comes after storing it in $ +2 /x # allow these comments ;` [download] The rationale is to match at the beginning all the one-character strings, storing them in $1. The rest of the input will be catched in $2 citromatik*	[reply] [d/l]
Re^3: String manipulation by Limbic~Region (Chancellor) on Jun 24, 2009 at 13:03 UTC
abhy, If you don't know about YAPE::Regex::Explain, it is worth a look. I don't think it has been updated to take into consideration the new regex doodads in Perl 5.10 but it should be helpful regardless. Cheers - L~R	[reply]


Think about Loose Coupling
	PerlMonks