Re: Search and replace the word in Column 16

The most flexible solution, and the one least likely to confuse coworkers, would be to split the string, test column 16, replace column 16, create a new string w/ join. e.g.:

sub split_join {
    my $line = shift;
    my @tokens = split /[|]/, $line;
    if ($tokens[15] eq 'STOCK') {
        $tokens[15] = 'BOXXE';
        return join('|',@tokens);
    }
    else {
        return $line;
    }
}
[download]

But the regex approach will run faster (by 77% according to my tests).

use strict;
use warnings;
use Benchmark qw(cmpthese);


my $line = <DATA>;


printf "Original: $line";
printf "   split: %s",split_join($line);;
printf "  simple: %s",simple_regex($line);;

cmpthese(5000, {
    splitjoin    => sub {split_join($line)},
    simple_regex => sub {simple_regex($line)},
});


sub split_join {
    my $line = shift;
    my @tokens = split /[|]/, $line;
    if ($tokens[15] eq 'STOCK') {
        $tokens[15] = 'BOXXE';
        return join('|',@tokens);
    }
    else {
        return $line;
    }
}

sub simple_regex {
    my $line = shift;
    #$line =~ s/^((?:[^|]*\|){15})STOCK/${1}BOXXE/;
    $line =~ s{^
                ( 
                  (?:
                    [^|]*
                    \|
                   ) {15}
                )
                STOCK
              }
              {${1}BOXXE}x;
    return $line;
}
__DATA__
AT0000937503|20060530|||142.708534||GROUP AG|30618720||||OPEN|ISIN|494
+3402|VSE|STOCK|39600000|0.77320|STOCK|test
[download]

Results:

Original: AT0000937503|20060530|||142.708534||GROUP AG|30618720||||OPE
+N|ISIN|4943402|VSE|STOCK|39600000|0.77320|STOCK|test
   split: AT0000937503|20060530|||142.708534||GROUP AG|30618720||||OPE
+N|ISIN|4943402|VSE|BOXXE|39600000|0.77320|STOCK|test
  simple: AT0000937503|20060530|||142.708534||GROUP AG|30618720||||OPE
+N|ISIN|4943402|VSE|BOXXE|39600000|0.77320|STOCK|test
               Rate    splitjoin simple_regex
splitjoin    4274/s           --         -44%
simple_regex 7576/s          77%           --
[download]

Comment on Re: Search and replace the word in Column 16 Select or Download Code

Replies are listed 'Best First'.
Re^2: Search and replace the word in Column 16 by davidrw (Prior) on Jul 25, 2006 at 14:35 UTC
This will run faster, and IMHO improves upon `split_join()` a little .. `sub index_split_join { return $_[0] unless index $_[0], 'STOCK' >= 0; # do a fast check +to see if line needs to be looked at my @tokens = split /\\|/, $_[0]; # split into columns $tokens[15] =~ s/STOCK/BOXXE/; # do replacement in col 16 return join('\|',@tokens); # glue back together for final r +esult }` [download] For your test of 1 data line, i get: `Rate splitjoin idxsplitjoin simple_regex splitjoin 50000/s -- -15% -60% idxsplitjoin 58824/s 18% -- -53% simple_regex 125000/s 150% 112% --` [download] But that test isn't valid. Presumably (?!?) there are many lines that need to be processed, and only a small percentage have the word 'STOCK' in them (which is where the `index` short circuit will excel). Here is a modified benchmark (the DATA is ~1000 lines, all with same # of cols, but only a handful have STOCK in them): my @lines = <DATA>; cmpthese(10000, { idxsplitjoin => sub {index_split_join($_) for @lines}, splitjoin => sub {split_join($_) for @lines}, simple_regex => sub {simple_regex($_) for @lines}, }); # RESULTS: Benchmark: timing 10000 iterations of idxsplitjoin, simple_regex, spli +tjoin... idxsplitjoin: 9 wallclock secs ( 9.16 usr + 0.00 sys = 9.16 CPU) @ +1091.70/s (n=10000) simple_regex: 11 wallclock secs (10.77 usr + 0.00 sys = 10.77 CPU) @ +928.51/s (n=10000) splitjoin: 158 wallclock secs (158.15 usr + 0.00 sys = 158.15 CPU) @ + 63.23/s (n=10000) Rate splitjoin simple_regex idxsplitjoin splitjoin 63.2/s -- -93% -94% simple_regex 929/s 1368% -- -15% idxsplitjoin 1092/s 1627% 18% -- [download]	[reply] [d/l] [select]
Re^3: Search and replace the word in Column 16 by imp (Priest) on Jul 25, 2006 at 15:03 UTC
Ah the perils of posting before your first cup of coffee in the morning - my original intent is exactly what you provided. good catch.	[reply]


Don't ask to ask, just ask
	PerlMonks