Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

This looked like an interesting problem to get the mental juices flowing on a Saturday morning. I see you've already received much help so, rather than trying to modify your code, I decided to write it from scratch. A few notes on some the things I did follow; ask if you want to know about anything else.

  • The autodie pragma handles I/O exceptions so I don't have to worry about doing that. I note that you're not checking any I/O operations.
  • I used constants to specify the indices of the search and replace elements in the CSV file. If the CSV format ever changes, you'll probably only need to change one or two numbers; as opposed to recoding splice() statements in a couple of places.
  • I used IO::Prompter for the choice of replacement text where more than one option existed. That used a lot less code than what's curently in search_and_replace() and has many more features. Options are presented in the form of a menu and you only need to hit one key (i.e. no newline) to make a selection. I've also used colour: if you want to use that you may need to fiddle with it a bit (it looks good on my black background). That uses Term::ANSIColor; I've added extra colour in a few places, such as highlighting the text to be replaced.
  • Text::CSV will use Text::CSV_XS if it's installed; but the code will still work if it's not.
  • The code is modular and pretty much all the work is done in three, short subroutines: _process_csv(), _process_text() and _replace(). Also note that those routines only use variables that are passed to them: there's no action-at-a-distance, or similar problems, from global or file-scoped variables.

Here's whitelist_sr.pl:

#!/usr/bin/env perl use strict; use warnings; use autodie; use constant { SEARCH => 0, REPLACE => 2, }; use File::Copy; use List::Util 'uniq'; use IO::Prompter [ -single, -style => 'bold blue', -echostyle => 'bold magenta', ]; use Term::ANSIColor; use Text::CSV; # For testing: BEGIN { copy('whitelist_sr_original.txt', 'whitelist_sr.txt'); } my $csv_file = 'whitelist_sr.csv'; my $source_file = 'whitelist_sr_src.txt'; my $text_file = 'whitelist_sr.txt'; copy($text_file, $source_file); my ($sr, $re) = _process_csv($csv_file); _process_text($source_file, $text_file, $sr, $re); unlink $source_file; sub _process_text { my ($source_file, $text_file, $sr, $re) = @_; open my $src_fh, '<', $source_file; open my $txt_fh, '>', $text_file; while (my $line = <$src_fh>) { while ($line =~ /$re/g) { my $frag = $1; $line =~ s/$frag/_replace($line, $frag, $sr, $re)/e; } print $txt_fh $line; } return; } sub _replace { my ($line, $frag, $sr, $re) = @_; my $replace; if (@{$sr->{$frag}} == 1) { $replace = $sr->{$frag}[0]; } else { $line =~ /$re/; $line = colored(substr($line, 0, $-[0]), 'dark yellow') . colored($1, 'bold yellow') . colored(substr($line, $+[0]), 'dark yellow'); print colored("Current line:", 'bold white'), ' ', $line; my $index = prompt "Replace '$frag' with:", -number, -menu => { map +($sr->{$frag}[$_] => $_), 0 .. $#{$sr->{$frag} +} }, '> '; $replace = $sr->{$frag}[$index]; } return $replace; } sub _process_csv { my ($csv_file) = @_; my $sr = {}; my $csv = Text::CSV::->new(); open my $fh, '<', $csv_file; while (my $row = $csv->getline($fh)) { push @{$sr->{$row->[SEARCH]}}, $row->[REPLACE]; } $sr->{$_} = [ uniq @{$sr->{$_}} ] for keys %$sr; my $alt = join '|', map "\Q$_", sort { length $b <=> length $a } keys %$sr; my $re = qr{\b($alt)\b}; return ($sr, $re); }

I extended your CSV file to enable more testing. It's in the spoiler:

I also created my own text file for testing. Again, in the spoiler:

Here's a rough representation of part of a run, showing dynamic changes to a line as a number of matches are found and the selected substitutions are made:

Current line: whitelist whitelist whitelist whitelist
Replace 'whitelist' with:
   1. access list
   2. access-list
   3. allow

> 2
Current line: access-list whitelist whitelist whitelist
Replace 'whitelist' with:
   1. access list
   2. access-list
   3. allow

> 3
Current line: access-list allow whitelist whitelist
Replace 'whitelist' with:
   1. access list
   2. access-list
   3. allow

> 1
Current line: access-list allow access list whitelist
Replace 'whitelist' with:
   1. access list
   2. access-list
   3. allow

> 2

Here's the final result after that full run. Spoiler again:

— Ken


In reply to Re: Immediately writing the results of search-and-replace by kcott
in thread Immediately writing the results of search-and-replace by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2022-12-01 00:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?