Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: Immediately writing the results of search-and-replace

by kcott (Archbishop)
on Aug 06, 2022 at 04:30 UTC ( [id://11145980]=note: print w/replies, xml ) Need Help??


in reply to Immediately writing the results of search-and-replace

This looked like an interesting problem to get the mental juices flowing on a Saturday morning. I see you've already received much help so, rather than trying to modify your code, I decided to write it from scratch. A few notes on some the things I did follow; ask if you want to know about anything else.

  • The autodie pragma handles I/O exceptions so I don't have to worry about doing that. I note that you're not checking any I/O operations.
  • I used constants to specify the indices of the search and replace elements in the CSV file. If the CSV format ever changes, you'll probably only need to change one or two numbers; as opposed to recoding splice() statements in a couple of places.
  • I used IO::Prompter for the choice of replacement text where more than one option existed. That used a lot less code than what's curently in search_and_replace() and has many more features. Options are presented in the form of a menu and you only need to hit one key (i.e. no newline) to make a selection. I've also used colour: if you want to use that you may need to fiddle with it a bit (it looks good on my black background). That uses Term::ANSIColor; I've added extra colour in a few places, such as highlighting the text to be replaced.
  • Text::CSV will use Text::CSV_XS if it's installed; but the code will still work if it's not.
  • The code is modular and pretty much all the work is done in three, short subroutines: _process_csv(), _process_text() and _replace(). Also note that those routines only use variables that are passed to them: there's no action-at-a-distance, or similar problems, from global or file-scoped variables.

Here's whitelist_sr.pl:

#!/usr/bin/env perl use strict; use warnings; use autodie; use constant { SEARCH => 0, REPLACE => 2, }; use File::Copy; use List::Util 'uniq'; use IO::Prompter [ -single, -style => 'bold blue', -echostyle => 'bold magenta', ]; use Term::ANSIColor; use Text::CSV; # For testing: BEGIN { copy('whitelist_sr_original.txt', 'whitelist_sr.txt'); } my $csv_file = 'whitelist_sr.csv'; my $source_file = 'whitelist_sr_src.txt'; my $text_file = 'whitelist_sr.txt'; copy($text_file, $source_file); my ($sr, $re) = _process_csv($csv_file); _process_text($source_file, $text_file, $sr, $re); unlink $source_file; sub _process_text { my ($source_file, $text_file, $sr, $re) = @_; open my $src_fh, '<', $source_file; open my $txt_fh, '>', $text_file; while (my $line = <$src_fh>) { while ($line =~ /$re/g) { my $frag = $1; $line =~ s/$frag/_replace($line, $frag, $sr, $re)/e; } print $txt_fh $line; } return; } sub _replace { my ($line, $frag, $sr, $re) = @_; my $replace; if (@{$sr->{$frag}} == 1) { $replace = $sr->{$frag}[0]; } else { $line =~ /$re/; $line = colored(substr($line, 0, $-[0]), 'dark yellow') . colored($1, 'bold yellow') . colored(substr($line, $+[0]), 'dark yellow'); print colored("Current line:", 'bold white'), ' ', $line; my $index = prompt "Replace '$frag' with:", -number, -menu => { map +($sr->{$frag}[$_] => $_), 0 .. $#{$sr->{$frag} +} }, '> '; $replace = $sr->{$frag}[$index]; } return $replace; } sub _process_csv { my ($csv_file) = @_; my $sr = {}; my $csv = Text::CSV::->new(); open my $fh, '<', $csv_file; while (my $row = $csv->getline($fh)) { push @{$sr->{$row->[SEARCH]}}, $row->[REPLACE]; } $sr->{$_} = [ uniq @{$sr->{$_}} ] for keys %$sr; my $alt = join '|', map "\Q$_", sort { length $b <=> length $a } keys %$sr; my $re = qr{\b($alt)\b}; return ($sr, $re); }

I extended your CSV file to enable more testing. It's in the spoiler:

$ cat whitelist_sr.csv whitelist entries,Noun,access list entries IP whitelist,Noun,IP access list IP whitelist entries,Noun,IP access list entries whitelist,Adjective,allow whitelist,Noun,access list whitelist,Noun,access-list whitelist,Verb,allow your whitelist,Noun,your access list your whitelist,Noun,my permission list your whitelist,Noun,thy permission list your whitelist,Noun,his permission list your whitelist,Noun,her permission list your whitelist,Noun,its permission list your whitelist,Noun,our permission list your whitelist,Noun,your permission list your whitelist,Noun,their permission list Verb,Noun,action Verb,,action

I also created my own text file for testing. Again, in the spoiler:

Here's a rough representation of part of a run, showing dynamic changes to a line as a number of matches are found and the selected substitutions are made:

Current line: whitelist whitelist whitelist whitelist
Replace 'whitelist' with:
   1. access list
   2. access-list
   3. allow

> 2
Current line: access-list whitelist whitelist whitelist
Replace 'whitelist' with:
   1. access list
   2. access-list
   3. allow

> 3
Current line: access-list allow whitelist whitelist
Replace 'whitelist' with:
   1. access list
   2. access-list
   3. allow

> 1
Current line: access-list allow access list whitelist
Replace 'whitelist' with:
   1. access list
   2. access-list
   3. allow

> 2

Here's the final result after that full run. Spoiler again:

$ cat whitelist_sr.txt access list entries,Noun,access list entries IP access list,Noun,IP access list IP access list entries,Noun,IP access list entries access list,Adjective,allow access-list,Noun,access list allow,Noun,access-list access list,action,allow your permission list,Noun,your access list your access list,Noun,my permission list thy permission list,Noun,thy permission list their permission list,Noun,his permission list our permission list,Noun,her permission list my permission list,Noun,its permission list its permission list,Noun,our permission list his permission list,Noun,your permission list her permission list,Noun,their permission list IP access list IP access list entries IP access list IP access-listIP allow entriesIP access list access-list allow access list access-list whitelistwhitelistwhitelistwhitelist IP access list access list entries allow IPwhitelistwhitelistentrieswhitelist IP access list thy permission list access list IP whitelistyour whitelistwhitelist IPaccess-list yourallow access list

— Ken

Replies are listed 'Best First'.
Re^2: Immediately writing the results of search-and-replace
by elemkeh (Acolyte) on Aug 09, 2022 at 14:37 UTC
    Your code is far more elegant than what I had written originally; I will be reading it carefully to add it to my understanding and I'm sure it will improve my Perl efforts going forward. Thank you.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11145980]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2024-03-29 07:11 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found