Something that comes up fairly often is a need to split a string to equal sized chunks. For instance, given the string "abcdefgh12345678", splitting it to 4-char chunks would produce ("abcd", "efgh", "1234", "5678"). Looking around the monastery, there're at least a
couple of
posts I have found.
I tried to time some different techniques against each other:
my $str = "abcdefgh12345678" x 20;
my $strlen = length $str;
cmpthese(50000, {
'grep_split' => sub
{
my @arr = grep {$_} split /(.{8})/, $str;
},
'split_pos' => sub
{
my @arr = split /(?(?{pos() % 8})(?!))/, $str;
},
'substr_map' => sub
{
my $len = length $str;
my @arr = map {substr($str, $_ * 8, 8)} (0 .. $strlen / 8 - 1);
},
'substr_loop' => sub
{
my @arr;
my $len = length $str;
for (my $i = 0; $i < $len; $i += 8)
{
push(@arr, substr($str, $i, 8));
}
},
'unpack' => sub
{
my @arr = unpack('(A8)*', $str);
}
});
And the results are quite surprising:
Rate
split_pos 3203/s
grep_split 6425/s
substr_map 8889/s
unpack 11348/s
substr_loop 15097/s
Contrary to what I have expected from my understanding (that built in functions should be faster than loops), the looping solution is the swiftest. It beats the unpack by a margin ranging from 15 to 50 percent, depending on the length of the string and the chunks.
Any way to make it faster ?
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.