This will run faster, and IMHO improves upon split_join() a little ..
sub index_split_join {
return $_[0] unless index $_[0], 'STOCK' >= 0; # do a fast check
+to see if line needs to be looked at
my @tokens = split /\|/, $_[0]; # split into columns
$tokens[15] =~ s/STOCK/BOXXE/; # do replacement in col 16
return join('|',@tokens); # glue back together for final r
+esult
}
For your test of 1 data line, i get:
Rate splitjoin idxsplitjoin simple_regex
splitjoin 50000/s -- -15% -60%
idxsplitjoin 58824/s 18% -- -53%
simple_regex 125000/s 150% 112% --
But that test isn't valid. Presumably (?!?) there are many lines that need to be processed, and only a small percentage have the word 'STOCK' in them (which is where the index short circuit will excel). Here is a modified benchmark (the DATA is ~1000 lines, all with same # of cols, but only a handful have STOCK in them):
my @lines = <DATA>;
cmpthese(10000, {
idxsplitjoin => sub {index_split_join($_) for @lines},
splitjoin => sub {split_join($_) for @lines},
simple_regex => sub {simple_regex($_) for @lines},
});
# RESULTS:
Benchmark: timing 10000 iterations of idxsplitjoin, simple_regex, spli
+tjoin...
idxsplitjoin: 9 wallclock secs ( 9.16 usr + 0.00 sys = 9.16 CPU) @
+1091.70/s (n=10000)
simple_regex: 11 wallclock secs (10.77 usr + 0.00 sys = 10.77 CPU) @
+928.51/s (n=10000)
splitjoin: 158 wallclock secs (158.15 usr + 0.00 sys = 158.15 CPU) @
+ 63.23/s (n=10000)
Rate splitjoin simple_regex idxsplitjoin
splitjoin 63.2/s -- -93% -94%
simple_regex 929/s 1368% -- -15%
idxsplitjoin 1092/s 1627% 18% --
|