Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Multiple / Mapping Search and Replace

by VinsWorldcom (Prior)
on Mar 17, 2009 at 12:42 UTC ( #751170=sourcecode: print w/replies, xml ) Need Help??
Category: Text Processing
Author/Contact Info

Michael Vincent

http://www.VinsWorld.com

Description: Ever want to search and replace, but on many terms and don't want to run a SAR routine over and over again for each instance? Script searches and replaces text in columns based on a mapfile. The output is a tab delimited text file.

Too large to upload. See link.

Code: http://www.vinsworld.com/software/msar.zip

POD: http://www.vinsworld.com/software/msar.html

Replies are listed 'Best First'.
Re: Multiple / Mapping Search and Replace
by jwkrahn (Monsignor) on Mar 17, 2009 at 22:19 UTC

    You should change:

    chomp $_; # Split the line by whitespace my @array = split (/\s+/, $_); # If the first column of a line is blank whitespace, shift + it out so the # first column corresponds to actual user viewable data, w +hich is the way # a user will count the columns, ignoring leading whitespa +ce. if ($array[0] eq "") { shift (@array) }

    To:

    # Split the line by whitespace my @array = split;

    Because split with no arguments ignores leading whitespace so there is nothing to remove and it also removes all trailing whitespace so there is nothing for chomp to remove.


    while (<@array>) {

    That is short for:

    while (glob join $", @array) {

    and it is usually written as:

    foreach (@array) {

      Thanks for the tips. Incorporated and tested fine.

      Question: What is better coding practice and quicker?

      if (defined($opt_ignore)) { if (defined($opt_reverse)) { $map{lc($array[1])} = $array[0] } else { $map{lc($array[0])} = $array[1] } } else { if (defined($opt_reverse)) { $map{$array[1]} = $array[0] } else { $map{$array[0]} = $array[1] } }

      or

      $map{ (defined($opt_ignore) ? lc( (defined($opt_reverse) ? $array[1] : + $array[0])) : (defined($opt_reverse) ? $array[1] : $array[0])) } = ( +defined($opt_reverse) ? $array[0] : $array[1])
        I would avoid both of those approaches. The first involves too much repetition of code, and the second is too obfuscated. How about like this:
        my ( $src, $dst ) = ( $opt_reverse ) ? ( 0, 1 ) : ( 1, 0 ); if ( $opt_ignore ) { $map{lc($array[$dst])} = $array[$src]; } else { $map{$array[$dst]} = $array[$src]; }
        (I hope I got the polarity right on what "reverse" means, but in any case, I think you can see the point.)

        (update: got rid of the "defined()" call -- the option values should just be true or false, and undefined is false)

        Another update: better solution:

        my ( $src, $dst ) = ( $opt_reverse ) ? (0,1) : (1,0); my $key = ( $opt_ignore ) ? lc($array[$dst]) : $array[$dst]; $map{$key} = $array[$src];

        The first one looks a lot more readable and maintainable.   As for speed, you would have to use something like Benchmark to determine which is fastest.

Re: Multiple / Mapping Search and Replace
by toolic (Bishop) on Mar 18, 2009 at 23:45 UTC
    A couple of comments on pod2usage, um, usage :)

    Since pod2usage by default exits, your explicit exit calls are redundant at best and confusing at worst. This

    pod2usage(-verbose => 1) && exit if defined $opt_help;
    is the same as this:
    pod2usage(-verbose => 1) if defined $opt_help;

    You could also take advantage of the -message option. This

    print "$0: input_file required\n"; pod2usage(-verbose => 0) && exit
    is the same as this:
    pod2usage(-verbose => 0, -message => "$0: input_file required\n")

    A comment on Getopt::Long usage: instead of declaring several separate $opt_XXX scalar variables, it might be more natural Storing options values in a hash.

Re: Multiple / Mapping Search and Replace
by Anonymous Monk on Apr 03, 2009 at 09:00 UTC
    I found a problem in your codes. While I have a text as below:
    april barrel
    and a dictionary like this:
    225 April 1168 barrel 3143 Il 9432 PR ....
    I get this as output form my input
    a94323143 11777rr340
    I dont know why april would be broken to a + PR + Il ...

      It happens because the map file is read into a hash and normally there is no "order" to a hash. Thus, you can't guarantee that the search and replace will happen in the order you give in you map file.

      You're actually hitting this part of the code:

      # user didn't specify columns, so just SAR each line and leave + alone } else { # loop through mapping array for each line foreach my $replace (keys(%map)) { # ignore case? if (defined($opt_ignore)) { $YESMapping += ($_ =~ s/$replace/$map{$replace}/gi +) } else { $YESMapping += ($_ =~ s/$replace/$map{$replace}/g) } } print $OUT $_ }

      Stick in a helpful print to "debug" what's going on:

      # user didn't specify columns, so just SAR each line and leave + alone } else { # loop through mapping array for each line foreach my $replace (keys(%map)) { # ignore case? if (defined($opt_ignore)) { print "SAR on $_ with $replace\n"; $YESMapping += ($_ =~ s/$replace/$map{$replace}/gi +) } else { $YESMapping += ($_ =~ s/$replace/$map{$replace}/g) } } print $OUT $_ }

      This is what we see using your input and mapping files:

      {C} > msar input.txt map.txt -r -i Reading mappings from file: map.txt ---------------------------- SAR on april with il SAR on apr3143 with barrel SAR on apr3143 with april SAR on apr3143 with pr a94323143 SAR on barrel with il SAR on barrel with barrel SAR on 1168 with april SAR on 1168 with pr 1168 ---------------------------- Mapped 3 entries.

      You could maybe fix it by adding in Tie::Hash (I think) which is supposed to be able to order your hash. You would need to manipulate the hash variable %map when it is loaded at the beginning of the program. Unfortunately, I don't have the time now to code this up, but hey, my Perl code is "open source" :-) so have at it!

      UPDATE: If your infile is just the one column of words, call with:

      {C} > msar input.txt map.txt -r -i -c 1

      {C} > msar.pl in.txt map.txt -i -r -c 1 Reading mappings from file: map.txt ---------------------------- 225 1168 ---------------------------- Mapped 2 entries.

      UPDATE: MSAR.pl code now updated to use -w option which replaces on WHOLE WORDS only. Also, map.txt file will be read AND parsed AND used in search and replace in the order it is written (line 1, line 2 ... line n).

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: sourcecode [id://751170]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (5)
As of 2020-06-01 22:28 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you really want to know if there is extraterrestrial life?



    Results (11 votes). Check out past polls.

    Notices?