Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

Search and replace the word in Column 16

by chanakya (Friar)
on Jul 25, 2006 at 12:09 UTC ( #563491=perlquestion: print w/replies, xml ) Need Help??

chanakya has asked for the wisdom of the Perl Monks concerning the following question:


During the coding process I have encountered a situation where I have to search and replace a particular column
For example I have the following data:
AT0000937503|20060530|||142.708534||GROUP AG|30618720||||OPEN +|ISIN|4943402|VSE|STOCK|39600000|0.77320
All the output lines generated will be of the same format as above. Before writing to the file
I want to search and replace the column 16 (i,e STOCK) with "BOXXE".

Till now I've been doing as below
$idx =~ s/STOCK/BOXXE/g;
Sometimes it may be possible that the same line have the word "STOCK" twice. My current code fails
if the same word appears twice. Please suggest how to replace the word *only* in column 16.


Replies are listed 'Best First'.
Re: Search and replace the word in Column 16
by davorg (Chancellor) on Jul 25, 2006 at 12:15 UTC
    • Use split to convert your record into an array
    • Examine (and maybe update) the sixteenth column
    • Use join to turn your array back into a string

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

Re: Search and replace the word in Column 16
by davidrw (Prior) on Jul 25, 2006 at 12:32 UTC
    i agree with jdtoronto's and davorg's split suggestion ...

    Another posibility (especially if there can be escaped delimiters) is to use one of the CSV modules to parse it.

    If you really wanted to fix the regex (i suspect that would be less efficient and less readable than split), something like:
    $idx =~ s/^((?:[^|]*\|){15}[^|]*)STOCK/$1BOXXE/;
    It's also possible (obviously relies on assumptions) for you to get away with just dropping the /g modifier .. then it would only break if "STOCK" also appeared in columns 1-15 somewhere, but would not break if it was in column 17+.
Re: Search and replace the word in Column 16
by imp (Priest) on Jul 25, 2006 at 13:06 UTC
    The most flexible solution, and the one least likely to confuse coworkers, would be to split the string, test column 16, replace column 16, create a new string w/ join. e.g.:
    sub split_join { my $line = shift; my @tokens = split /[|]/, $line; if ($tokens[15] eq 'STOCK') { $tokens[15] = 'BOXXE'; return join('|',@tokens); } else { return $line; } }
    But the regex approach will run faster (by 77% according to my tests).
      This will run faster, and IMHO improves upon split_join() a little ..
      sub index_split_join { return $_[0] unless index $_[0], 'STOCK' >= 0; # do a fast check +to see if line needs to be looked at my @tokens = split /\|/, $_[0]; # split into columns $tokens[15] =~ s/STOCK/BOXXE/; # do replacement in col 16 return join('|',@tokens); # glue back together for final r +esult }
      For your test of 1 data line, i get:
      Rate splitjoin idxsplitjoin simple_regex splitjoin 50000/s -- -15% -60% idxsplitjoin 58824/s 18% -- -53% simple_regex 125000/s 150% 112% --
      But that test isn't valid. Presumably (?!?) there are many lines that need to be processed, and only a small percentage have the word 'STOCK' in them (which is where the index short circuit will excel). Here is a modified benchmark (the DATA is ~1000 lines, all with same # of cols, but only a handful have STOCK in them):
      my @lines = <DATA>; cmpthese(10000, { idxsplitjoin => sub {index_split_join($_) for @lines}, splitjoin => sub {split_join($_) for @lines}, simple_regex => sub {simple_regex($_) for @lines}, }); # RESULTS: Benchmark: timing 10000 iterations of idxsplitjoin, simple_regex, spli +tjoin... idxsplitjoin: 9 wallclock secs ( 9.16 usr + 0.00 sys = 9.16 CPU) @ +1091.70/s (n=10000) simple_regex: 11 wallclock secs (10.77 usr + 0.00 sys = 10.77 CPU) @ +928.51/s (n=10000) splitjoin: 158 wallclock secs (158.15 usr + 0.00 sys = 158.15 CPU) @ + 63.23/s (n=10000) Rate splitjoin simple_regex idxsplitjoin splitjoin 63.2/s -- -93% -94% simple_regex 929/s 1368% -- -15% idxsplitjoin 1092/s 1627% 18% --
        Ah the perils of posting before your first cup of coffee in the morning - my original intent is exactly what you provided.

        good catch.

Re: Search and replace the word in Column 16
by jdtoronto (Prior) on Jul 25, 2006 at 12:13 UTC
    Maybe you could split the record,
    my @fields = split /|/, $record;
    The replace the field if necessary and then recreate the record by joining the fields again?

    I am sure someone will have a far more efficient and exotic answer, but sometimes first principles works.

    Of course, gellyfish is right, the | should be escaped:

    my @fields = split /\|/, $record;

      I don't think you meant split /|/ as witnessed by the the output of the following:

      print +join '*', split /|/, 'AT0000937503|20060530|||142.708534||GROUP + AG|30618720||||OPEN|ISIN|4943402|VSE|STOCK|39600000|0.77320';
      You need to escape the regex metacharacter '|'.


Re: Search and replace the word in Column 16
by Ieronim (Friar) on Jul 25, 2006 at 12:28 UTC
    Regexp-based way is imho worse than spliting and joining, but it allows to do all work in one line:
    $str =~ s/^((?:[^|]*\|){15})STOCK/${1}BOXXE/;
    Or, if you prefer counting back from the end of the string and dislike capturing:
    $str =~ s/STOCK(?=(?:\|[^|]*){2}$)/BOXXE/;

         s;;Just-me-not-h-Ni-m-P-Ni-lm-I-ar-O-Ni;;tr?IerONim-?HAcker ?d;print
Re: Search and replace the word in Column 16
by mickeyn (Priest) on Jul 25, 2006 at 12:34 UTC
    another suggestion - try Larry's a2p:
    # a2p $16=="STOCK" {$16="BOXXE"}
    it'll give you the piece of code you need.


Re: Search and replace the word in Column 16
by wfsp (Abbot) on Jul 25, 2006 at 12:19 UTC
    or perhaps

    $rec =~ s/([\w|]{15})STOCK/$1BOXXE/;

    See Ieronim's regex below

      This won't work. Be stricter, this will work for the 101st column too.

           s;;Just-me-not-h-Ni-m-P-Ni-lm-I-ar-O-Ni;;tr?IerONim-?HAcker ?d;print
Re: Search and replace the word in Column 16
by kulls (Hermit) on Jul 26, 2006 at 03:50 UTC
    If your script is going to generate this data
    AT0000937503|20060530|||142.708534||GROUP AG|30618720||||OPEN|ISIN|4 +943402|VSE|STOCK|39600000|0.77320
    , then I suggest you to do this search and replace option before create this string data ??.
    "Prevention is better than cure " correct ??

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://563491]
Approved by jdtoronto
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2022-12-08 01:55 GMT
Find Nodes?
    Voting Booth?

    No recent polls found