Splitting a string to chunks

spurperl has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Splitting a string to chunks by Limbic~Region (Chancellor) on Nov 29, 2006 at 13:50 UTC
spurperl, I was suprised to see that unpack wasn't the fastest so I changed it just a bit. `my @arr = unpack('A8A8A8A8A8A8A8A8A8A8A8A8A8A8A8A8A8A8A8A8', $str);` [download] Not only is that compatible with older perl's - it is now the fastest. I might play a bit more to see if I can get an even faster version but to be fair, that really should have been: `# No longer wins but is still faster than '(A8)' my @arr = unpack((join '', ('A8' x ($strlen / 8))), $str);` [download] Update:* I wanted to see what would happen if the benchmark focused more on the functions themselves by removing some of the intermediate calculations. Noticed also I changed x 20 to x 200. Read more... (2 kB) Cheers - L~R	[reply] [d/l] [select]
Re: Splitting a string to chunks by duff (Parson) on Nov 29, 2006 at 13:52 UTC
On my system, I get a different result: Rate split_pos grep_split substr_map substr_loop unpack split_pos 4596/s -- -53% -71% -79% -82% grep_split 9843/s 114% -- -37% -54% -61% substr_map 15674/s 241% 59% -- -27% -38% substr_loop 21459/s 367% 118% 37% -- -15% unpack 25381/s 452% 158% 62% 18% -- Your performance characteristics depend on all sorts of things relating to your CPU, its cache, bus speed, memory, etc. But as far as ways to make it faster, you might want to use an idiomatic for loop instead of the C-style loop. duff	[reply]
Re: Splitting a string to chunks by Fengor (Pilgrim) on Nov 29, 2006 at 13:48 UTC
what about `'regex' => sub { my @arr = $str =~ /(........)/g }` [download] didn't time it though. -- "WHAT CAN THE HARVEST HOPE FOR IF NOT THE CARE OF THE REAPER MAN" -- Terry Pratchett, "Reaper Man"	[reply] [d/l]
Re^2: Splitting a string to chunks by themage (Friar) on Nov 29, 2006 at 14:13 UTC
Hi, I added another version, that split string that split with a smaller last chunk. Added also a /o, to improve performace (that can be used if you have several lines to split. Added this to the benchmark: `'regex' => sub { my @arr = $string =~ /(........)/g; }, 'regexo' => sub { my @arr = $string =~ /(.{1,8})/og; },` [download] The results: Rate split_pos split grep_split substr_map substr_lo +op unpack regex regexo split_pos 7295/s -- -57% -60% -68% -7 +7% -78% -100% -100% split 16900/s 132% -- -7% -26% -4 +7% -50% -100% -100% grep_split 18241/s 150% 8% -- -20% -4 +3% -46% -100% -100% substr_map 22883/s 214% 35% 25% -- -2 +9% -32% -99% -100% substr_loop 32139/s 341% 90% 76% 40% +-- -4% -99% -99% unpack 33495/s 359% 98% 84% 46% +4% -- -99% -99% regex 4342185/s 59421% 25593% 23705% 18876% 1341 +1% 12864% -- -6% regexo 4596612/s 62909% 27098% 25099% 19988% 1420 +2% 13623% 6% -- [download] TheMage http://www.talking-web.org	[reply] [d/l] [select]
Re^3: Splitting a string to chunks by Fengor (Pilgrim) on Nov 29, 2006 at 14:30 UTC
umhmm you got my typo. i accidentally used $string instead of $str in my post first. that explains the high rates for the regex solution. here is the timing with the typo corrected: `Rate split_pos grep_split substr_map regexo regex subst +r_loop unpack split_pos 5587/s -- -65% -69% -76% -77% + -79% -81% grep_split 15974/s 186% -- -12% -32% -34% + -40% -45% substr_map 18051/s 223% 13% -- -23% -26% + -32% -38% regexo 23474/s 320% 47% 30% -- -3% + -12% -20% regex 24272/s 334% 52% 34% 3% -- + -9% -17% substr_loop 26596/s 376% 66% 47% 13% 10% + -- -9% unpack 29240/s 423% 83% 62% 25% 20% + 10% --` [download] -- "WHAT CAN THE HARVEST HOPE FOR IF NOT THE CARE OF THE REAPER MAN" -- Terry Pratchett, "Reaper Man"	[reply] [d/l]
Re^3: Splitting a string to chunks by Limbic~Region (Chancellor) on Nov 29, 2006 at 14:35 UTC
themage, Your benchmark disagrees with mine (with x 20 and x 200). Additionally, I think you should re-read perlre with regards to what /o does. I am sure diotalevi will improve upon my explanation but in a nutshell, /o is an old optimization predating qr//. If you needed to interpolate a variable inside a regex such as /$regex/ but knew that $regex would never change, the flag would tell perl to only compile the regex once. In fact, if you broke your promise and changed $regex then it would still not recompile it leading to buggy code. Then came along qr// and improved things greatly (see /o is dead, long live qr//!). Since you are not using a variable in your interpolation - the /o is having no effect. See also this regarding how current perl's optimize regex compiling. Unfortunately I couldn't seem to find this in any perldelta from 5.6.1 to 5.9.4 which makes me suspicious so I posted Questions concerning /o regex modifier. Cheers - L~R	[reply]
Re^3: Splitting a string to chunks by BrowserUk (Patriarch) on Nov 29, 2006 at 14:33 UTC
Without having run your benchmark, the huge disparity between your solutions and the others make me very suspicious that your code is not producing the same results as the others. Have you checked? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal? "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^4: Splitting a string to chunks by Fengor (Pilgrim) on Nov 29, 2006 at 14:50 UTC
Re: Splitting a string to chunks by Not_a_Number (Prior) on Nov 29, 2006 at 15:15 UTC
Any way to make it faster ? On my machine, this is slightly faster still: `'substr_loop2' => sub { my @arr; my $s = $str; push @arr, substr $s, 0, 8, '' while $s; },` [download] More seriously, though, not all the subs in your OP are equivalent: `'substr_map'` will truncate any string at a multiple of eight characters, while the others will include the extra characters in the final element of the array (Fengor's `my @arr = $str =~ /(........)/g` has the same problem).	[reply] [d/l] [select]
Re^2: Splitting a string to chunks by Fengor (Pilgrim) on Nov 29, 2006 at 16:30 UTC
thx for pointing out. what about `'regexpad' => sub { # padding the string my $padding = 8 - length($str%8) if length($str%8); #has to be 8 - m +odulo not modulo, thx johngg $str .= "x" x $padding; # dividing the string in parts of 8 chars my @arr = $str =~ /(........)/g; #remove padding $arr[-1] = substr($arr[-1],-$padding); }` [download] although its a bit slower than the other 2 regex solutions Rate split_pos grep_split substr_map regexpad regexo re +gex substr_loop unpack split_pos 5841/s -- -64% -70% -75% -76% - +76% -79% -80% grep_split 16129/s 176% -- -17% -30% -33% - +34% -42% -45% substr_map 19531/s 234% 21% -- -15% -19% - +20% -30% -33% regexpad 22936/s 293% 42% 17% -- -5% +-6% -17% -22% regexo 24038/s 312% 49% 23% 5% -- +-1% -13% -18% regex 24272/s 316% 50% 24% 6% 1% + -- -13% -17% substr_loop 27778/s 376% 72% 42% 21% 16% +14% -- -5% unpack 29240/s 401% 81% 50% 27% 22% +20% 5% -- [download] Edit: corrected padding -- "WHAT CAN THE HARVEST HOPE FOR IF NOT THE CARE OF THE REAPER MAN" -- Terry Pratchett, "Reaper Man"	[reply] [d/l] [select]
Re^3: Splitting a string to chunks by johngg (Canon) on Nov 29, 2006 at 23:25 UTC
I think your "padding" algorith might be a bit wonky. Given a string of length, say, 19 characters you would arrive at a `$padding` value of 3, thus padding your `$str` with three "x"s to end up with a length of 22, not 24 as I think you wanted. This should work (not tested) `my $padding = 8 - ($str % 8);` [download] The remove padding part would be something like (again, not tested) `substr $arr[-1], -$padding, $padding, q{} if $padding;` [download] Cheers, JohnGG Update: I must have been half-asleep; where's the `length` call? Line should be `my $padding = 8 - (length($str) % 8);` [download] You can't do modulo on a string :) `$ perl -le '$str = q{abc}; $pad = $str % 8; print $pad;' 0 $ perl -le '$str = q{abcdefghijkl}; $pad = $str % 8; print $pad;' 0 $` [download]	[reply] [d/l] [select]


Syntactic Confectionery Delight
	PerlMonks