Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^3: Immediately writing the results of search-and-replace

by haukex (Archbishop)
on Aug 05, 2022 at 20:20 UTC ( [id://11145977]=note: print w/replies, xml ) Need Help??


in reply to Re^2: Immediately writing the results of search-and-replace
in thread Immediately writing the results of search-and-replace

Thanks for providing additional context.

if my script finds that a file contains the search string 'IP whitelist' on a given line, even if I confirm that I want to replace this with 'IP access list', it will then prompt me to replace just 'whitelist' with one of its various candidates. What I would expect to happen in that case is that the replacement of 'IP whitelist' with 'IP access list' happens before the search for the 'whitelist' key is initiated, preventing it from finding a match there.

What is actually going on (and what I didn't realize on my first reading) is that you're reading a line from the input file into the $target_string variable, but once that line is in that variable, it never changes, regardless of whether you edit the file it came from, since the line was copied from the file into memory. That's why the regex engine continues to find matches in $target_string even after you've made the edit in the file.

The common pattern to use in Perl instead is to write an output file with the modified lines while reading the input file. Since in your code, you seem to want to do in-place edits of the file, see my node on that here. The slurp-then-spew approach is fairly easy if your files will always fit comfortably into memory; for line-by-line editing I might recommend my own module File::Replace (for examples of line-by-line editing, see the module's documentation instead of the aforementioned node).

In other words, instead of your edit_file calls, you'd do something like $target_string =~ s/$search/$replace_choice/gi, and then build your output file by writing out each $target_string line. However, there is still more trickyness here: If you have a line such as "Edit the whitelist by pressing the edit whitelist button." or something like that, then because of the /g modifier on that regex, you'd only be presented with the choice for what term to replace "whilelist" with once, and that replacement would be carried out for the entire line. I kind of doubt that's a desired behavior for you.

BTW, are you sure your search terms will always be on one line, or could strings like "whitelist entries" be split on two lines? That would also require an advanced approach.

These are examples for reasons why I asked for representative sample input along with the expected output for that input, and in this case representative also means that it should include all of such "interesting" cases. While I'd like to provide sample code or at least more specific advice on how to fix your code, without knowing the answers to questions like these, I run the risk of writing something that ends up not working on your actual data. Extensive test cases are very useful!

What I am currently thinking is that it should be possible to do something like s{ ($search) }{ get_replacement($1) }xeg where sub get_replacement does the prompting of the user and returns the replacement term (or the original string if the user doesn't want to replace). $search can even be a regex of all of the search terms, built dynamically.

A few more things I noticed in your code:

  • Check your opens for errors - "open" Best Practices
  • Even though you're using File::Find, for better modularity I would use regular argument passing to sub search_and_replace, and write e.g. wanted => sub { search_and_replace(\@table, $_) }
  • It seems to me like your @table could be replaced by a hash instead of all the duplicate key detection code you currently have. Keys in Perl hashes are unique, and such tables are easy to build with Perl's autovivification - if I treat a nonexistent hash entry as if it were an array reference, it will automagically become an array reference. Note that since you want to do case-insenstive matching I'm using fc, which is better than lc in case of Unicode. Building a hash of arrays using your sample data:
    use warnings; use strict; use feature qw/fc/; use Text::CSV_XS; use List::Util qw/uniq/; use Data::Dump; my $csvfile = $ARGV[0]; my %table; open my $fh, '<', $csvfile or die "$csvfile: $!"; my $csv = Text::CSV_XS->new({binary=>1, auto_diag=>2}); while ( my $row = $csv->getline($fh) ) { push @{$table{ fc($row->[0]) }}, $row->[2]; } $csv->eof or $csv->error_diag; close $fh; $_ = [uniq @$_] for values %table; dd \%table; __END__ { "ip whitelist" => ["IP access list"], "ip whitelist entries" => ["IP access list entries"], "whitelist" => ["allow", "access list", "access-list"], "whitelist entries" => ["access list entries"], "your whitelist" => ["your access list"], }
  • As before, instead of looping over @tables (or %tables), building a single regex should be more efficient.
  • You may wish to consider modules for prompting instead of rolling your own.
  • (It's Friday evening here so I didn't scan the rest of your code in detail, so other Monks may have further tips.)

Replies are listed 'Best First'.
Re^4: Immediately writing the results of search-and-replace [OT - FYI]
by kcott (Archbishop) on Aug 05, 2022 at 22:12 UTC

      Thanks very much, I'll look into this as soon as I can! The module used to have perfect tests on CPAN Testers, but unfortunately ever since I introduced File::Replace::Inplace, there have been spurious test failures. I hope this is one of those cases, and I think you'd be ok to force install the module anyway, the core functionality is quite well tested.

        There's no real urgency for me on this. I was just curious. I'll await the next version rather than forcing an install.

        I am a CPAN tester. The process didn't get far enough for a report to be generated: that may have been due to cpan hanging (as I indicated in the issue). It may be that others hit the same problem but didn't raise an issue. I note that there are no entries at all in the "cygwin" column of "CPAN Testers Matrix: File-Replace 0.14 (latest distribution)".

        Unless you have a Cygwin installation for testing, I'd be happy to help with that.

        I'd suggest moving further discussion from this OT sub-thread to a new topic.

        — Ken

      I hadn't used File::Replace previously, so I thought I'd give it a go; unfortunately it failed during testing. I raised an issue: "t/11_chmod.t fails on Cygwin v5.36.0".

      I've finally fixed this issue! v0.18 of the module should be going up on CPAN in the next hour or so.

        No issues with installation of v0.18.

        cpan[1]> install File::Replace ... HAUKEX/File-Replace-0.18.tar.gz /usr/bin/make -- OK ... HAUKEX/File-Replace-0.18.tar.gz /usr/bin/make test -- OK ... HAUKEX/File-Replace-0.18.tar.gz /usr/bin/make install -- OK

        You should get a green swatch, in the "cygwin" column, against Perl v5.36.0, when "CPAN Testers Matrix: File-Replace 0.18" is next updated.

        — Ken

Re^4: Immediately writing the results of search-and-replace
by elemkeh (Acolyte) on Aug 09, 2022 at 14:34 UTC
    Just wanted to make sure I took the time to thank you for your help. You've given me plenty to chew on and improve my Perl with. Cheers!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11145977]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (3)
As of 2024-03-19 02:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found