I caught the tail end of a discussion on irc.freenode.net's #perl channel about how to split a string into equal-sized chunks. Some people were trying to use split() to accomplish this; one person fell prey to this:
my $string = "abcdefghi";
my @fields = split /(?=.{3})/, $string;
They expected this to mean "split $string at every location that is followed by three characters (and then skip ahead three characters!)", but what it really means is "split $string at every location that is followed by three characters". They ended up getting ("a", "b", "c", "d", "e", "f", "ghi").
So how can you use split() to do this? Someone said "Couldn't you abuse \G?", and that reminded me of the internal assignment to $_ of the string being matched against, and the resulting use of pos()! I present:
my @fields = split /(?(?{pos() % 3})(?!))/, $string;
Update: Yes, I know about unpack(), etc. This was merely presented as the most direct way to accomplish the task using split().
Re: Using split() to divide a string by length
by Zaxo (Archbishop) on Apr 13, 2006 at 19:39 UTC
|
Another trick that works is to capture the split characters, which places them also in @fields and makes pos advance beyond them. Since all but probably the last group match, the normal split results mostly don't contain anything, so we need to filter out false elements with grep:
my $string = join '', a..z;
my @fields = grep {$_} split /(.{3})/, $string;
print "@fields\n";
__END__
abc def ghi jkl mno pqr stu vwx yz
| [reply] [d/l] |
|
Your grep should be grep { defined }.
| [reply] [d/l] |
|
Yep, I tried with defined first because that's the way I thought it worked, too. With that, the result of mine is,
abc def ghi jkl mno pqr stu vwx yz
Note the extra spaces, indicating that there are defined empty strings instead of undefs in those positions.
Update: Good idea, ikegami++.
| [reply] [d/l] |
|
Re: Using split() to divide a string by length
by BrowserUk (Patriarch) on Apr 13, 2006 at 20:08 UTC
|
print for unpack '(A3)*', "abcdefghi";;
abc
def
ghi
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
|
Note: Parens requires Perl 5.8.0.
| [reply] |
|
Yep, I know. I remember it it being added.
I also remember it from the last time you told me.
And the time before that.
So, what is your point?
- I can't use parens in pack/unpack templates because it's only been available for 4 years*?
- I shouldn't mention my preference for a solution because it's only been available for the last 8 releases?
- Everytime I suggest a solution that uses a feature that isn't available in every build of perl, I should add a footnote that ikegami has (unnecessarily) reminded me that this feature has only been available for the last 8 releases and 4 years*?
I know, I know. You're just "expanding knowledge".
Perhaps you should also consider adding footnotes to all your posts that use or recommend other features that have not been around forever? Like say, the 3-arg open; or even hashes?
(*) For the pedantic, 3 years, 8 months, 16 days 4 hours (approx. at the time of posting).
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
|
|
|
Re: Using split() to divide a string by length
by duff (Parson) on Apr 13, 2006 at 22:09 UTC
|
I hereby propose that we patch split such that its first argument, if it's a reference to an integer, will split the string into chunk of characters each with as many chars as that integrer (except the last of course). Come on! Who's with me? :-)
(for the humor impaired, I'm not being serious)
| [reply] [d/l] |
Re: Using split() to divide a string by length
by chibiryuu (Beadle) on Apr 13, 2006 at 20:10 UTC
|
This doesn't use split, but is the first thing I think of:
my $string = join '', 'a'..'z';
my @fields = $string =~ /.{1,3}/g;
my @fields2 = grep {$a=!$a} @fields;
Hmm, $string =~ /.{1,3}/g should even be faster than split /(?(?{pos() % 3})(?!))/, $string. I guess not as fast as unpack, though. | [reply] [d/l] [select] |
|
| [reply] |
Re: Using split() to divide a string by length
by radiantmatrix (Parson) on Apr 14, 2006 at 16:12 UTC
|
I'm not sure split is the right choice for extracting fixed-length substrings. Isn't that really what substr is for (I mean, if you don't want to use unpack)?
sub split_len {
## split_len( $chars, $string[, $limit] )
## - splits $string into chunks of $chars chars
## - limits number of segments returned to $limit, if provided
my ($chars, $string) = @_;
my ($i, @result);
for ($i = 0; ($i+$chars) < length($string); $i+=$chars) {
last if (defined $limit && @result >= $limit);
push @result, substr($string, $i, $chars);
}
# deal with any short remainders
return @result if (defined $limit && @result >= $limit);
if ($i > length($string)-$chars) {
push @result, substr($string, $i);
}
return @result;
}
| [reply] [d/l] |
|
sub split_len {
my ($str, $start, $len) = @_;
my @ret;
for (my $strlen = length $str; $start <= $strlen; $start += $len)
+{
push @ret, substr $str, $start, $len;
}
return @ret;
}
my $c = join '', 'a'..'z';
print "@{[ split_len $c, 0, 3 ]}\n";
print "@{[ split_len $c, 0, 4 ]}\n";
print "@{[ split_len $c, 3, 4 ]}\n";
__END__
abc def ghi jkl mno pqr stu vwx yz
abcd efgh ijkl mnop qrst uvwx yz
defg hijk lmno pqrs tuvw xyz
| [reply] [d/l] |
|
|