Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

user input Regular Expression

by Saved (Beadle)
on Dec 13, 2020 at 13:23 UTC ( [id://11125108]=perlquestion: print w/replies, xml ) Need Help??

Saved has asked for the wisdom of the Perl Monks concerning the following question:

I used Perl for years, but it has been a few years since. I would like some guidance, please. The task is simple. Make changes to a file based on Regex user input, and report how many changes were made. I have the basic changes working for a string passed in but wish to be able to pass in 2 RegEx containing Pattern and Replacement or one containing both. Maybe to change { to ( or to add or remove some chars.
use strict; use warnings; # Get file to process (my $file, my $pattern, my $replacement) = @ARGV; my $Count = 0; # Read file open my $FH, "<", $file or die "Unable to open $file for read exited $ +? $!"; chomp (my @lines = <$FH>); close $FH; # Parse and replace text in same file open $FH, ">", $file or die "Unable to open $file for write exited $? +$!"; for (@lines){ say {$FH} $_ unless (/$pattern/); #Write to file unchanged say {$FH} $_ if (s/$pattern/$replacement/g); #Write to file change +d if ($_ =~ /$pattern/) { #Need to check & count multiple occur +rences in same line $Count = $Count + 1; #Need to increment correctly print $Count; #Wanted to test but will not print return $Count; #Need to get count back to main } } close $FH; print $Count;
#Should have total count of changes for file Thanx much for your time, and any help given. Bob

Replies are listed 'Best First'.
Re: user input Regular Expression
by BillKSmith (Monsignor) on Dec 13, 2020 at 20:09 UTC
Re: user input Regular Expression (updated)
by haukex (Archbishop) on Dec 13, 2020 at 17:19 UTC

    If you could provide a Short, Self-Contained, Correct Example that includes sample input and the expected output for that input, we would be able to help better. I also don't understand your comments like "Need to get count back to main", perhaps because of some missing context? In regards to your logic, IMHO you're running the regex too often; in particular, you're incrementing $Count if you find the pattern in the line, after having run the s///g, which means that unless $replacement happens to match $pattern, the regex won't match and you won't increment $Count!

    You only need to run the regex once, and you can use the regex's return value in the single if statement you need for this code. Another thing to note is that you don't necessarily need to first read the file into memory (@lines), as this will be inefficient for large files; the typical solution is writing to a temporary file while still reading from the input. This is what Perl's -i switch does (which can be done via $^I), and what my module File::Replace does while trying to be more robust (more error checking instead of warnings, and atomic file replacement if supported by the OS and filesystem). That's why I'm using the latter in the following code, but you could continue using your read-to-memory version if your input files will always be small enough to fit into memory. The important thing here is the loop body.

    use warnings; use strict; use feature 'say'; use File::Replace 'replace3'; my ($file, $pat, $replace) = @ARGV; my $Count = 0; my ($infh, $outfh, $rep) = replace3($file); while (<$infh>) { chomp; if ( s/$pat/$replace/g ) { $Count++ } say {$outfh} $_; } $rep->finish; say "Count=$Count";

    Of course, if this is all your program is doing, you can do it with a oneliner too, e.g.:

    perl -wMstrict -i -nle 's/PAT/REPL/g and $a++;print}{print $a//0' FILE

    Update: If you need to account for multiple replacements in the line, you can replace the whole if in the above script with $Count += s/$pat/$replace/g;, as s///g returns the number of replacements made. In the oneliner that'd be '$a+=s/PAT/REPL/g;print}{print$a//0' (BTW, see "Eskimo greeting" in perlsecret in regards to the trick used in the oneliner).

Re: user input Regular Expression -- oneliner
by Discipulus (Canon) on Dec 13, 2020 at 17:08 UTC
    Hello Saved,

    Generally is a bad idea to do an inplace edit without backup. For this reason Perl -i commandline switch always ask for a backup file extension.

    Infact a simple oneliner (pay attention to windows doublequotes around oneliner!) can do the basic work you need to get started:

    cat add-data.txt 432 10TH ST APT (Range 2A - 2B) BROOKLYN NY 10598-6601 432 10TH ST APT (Range 3A - 3B) BROOKLYN NY 10598-6601 432 10TH ST APT (Range 4A - 4B) BROOKLYN NY 10598-6605 432 10TH ST APT (Range 5A - 5D) BROOKLYN NY 10598-6605 432 10TH ST APT 6A BROOKLYN NY 10598-6605 perl -i.bak -p -e "s/BROOKLYN/BROCCOLINO/ and $c++; END{print qq($c ch +anges\n)}" add-data.txt 5 changes cat add-data.txt 432 10TH ST APT (Range 2A - 2B) BROCCOLINO NY 10598-6601 432 10TH ST APT (Range 3A - 3B) BROCCOLINO NY 10598-6601 432 10TH ST APT (Range 4A - 4B) BROCCOLINO NY 10598-6605 432 10TH ST APT (Range 5A - 5D) BROCCOLINO NY 10598-6605 432 10TH ST APT 6A BROCCOLINO NY 10598-6605

    You dont need the two times say in your original program: just do the replacement (add to counter if it happens) and write the line to the file (changed or not).

    To get a longer and more complex program it would be better to use Getopt::Long to grab your arguments: program.pl -f file.txt --backup file.txt.bak --match BROOKLYN --replace BROCCOLINO

    You can benefit from qr to get your pattern compiled (and you can check there for pattern compilation errors).

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
Re: user input Regular Expression
by jwkrahn (Abbot) on Dec 13, 2020 at 19:56 UTC

    I see a couple of answers like this:

    $ echo " x x x x x x x x x x x x x x x x x x x x x" | perl -pe's/x/o/g && ++$c }{ print "Count = $c\n" +' Count = 3
    $ echo " x x x x x x x x x x x x x x x x x x x x x" | perl -pe'if ( s/x/o/g ) { ++$c } }{ print "Count + = $c\n"' Count = 3

    That don't accurately count replacements. Use the return value from s///g:

    $ echo " x x x x x x x x x x x x x x x x x x x x x" | perl -pe'$c += s/x/o/g }{ print "Count = $c\n"' Count = 21
Re: user input Regular Expression
by jcb (Parson) on Dec 14, 2020 at 01:03 UTC

    If this is a command-line tool, have you considered simply using sed(1)? You seem to be reimplementing it.

    If there is some frontend interface here more complex (like a CGI script, other Web toolkit, or a GUI, like Tk) then you are mostly on the right track, but it would be far better to write to a new file and then use rename to replace the existing file after the process is complete. Something like: (untested)

    use strict; use warnings; sub toy_sed_on_file { my $pattern = shift; my $replacement = shift; my $file = shift; my $count = 0; open my $in, '<', $file or die "open $file: $!"; open my $out, '>', $file.$$ or die "open output ${file}$$: $!"; while (<$in>) { $count += s/$pattern/$replacement/g; # does nothing if m/$pattern +/ would not match print $out $_; } close $in or die "close $file: $!"; close $out or die "close output ${file}$$: $!"; if ($count > 0) { rename $file.$$, $file or die "replace input file: + $!" } else { unlink $file.$$ or die "remove output file: $!" } return $count; }

    Note that the s/// pattern-match-replacement operator returns the number of matches found, so there is no need in this program to separately count the matches.

    Also note that using $$ (the process ID) to construct temporary filenames is only suitable in a secure and trusted environment but is very simple for a demonstration; you should probably use File::Temp and the DIR option to tempfile to ensure that the output file is created in the same directory as the input file. (The rename builtin can only be guaranteed (barring "disk full" and other errors) to work within the same directory.)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11125108]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (3)
As of 2024-04-19 19:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found