Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: How do I delete lines matching a certain condition and make sure each line has the right prefix and suffix?

by sundialsvc4 (Abbot)
on Sep 17, 2014 at 00:09 UTC ( [id://1100866]=note: print w/replies, xml ) Need Help??


in reply to How do I delete lines matching a certain condition and make sure each line has the right prefix and suffix?

In general, I like to approach problems like this in a slightly different way:   I will read “the original file,” one record at a time, in a loop, and decide which records from that file I want to keep.   I can also change the content of each record in any way that I please.   I will then write those records, one at a time, to “the next generation of ‘the original file.’”   So, when the program is finished, I am left with two files:   “before,” and “after.”     I can then compare the two ... the diff command (in Unix/Linux) comes in very handy here.   If I like what I see, I can (separately) throw-away or archive the original file and keep the new one with just a few rename-commands in the shell.   And, if I don’t like what I see, nothing has been lost or harmed.   It is, in other words, a non-destructive process that works for datasets of any size.

This approach works equally well with files of unlimited length, because it’s just being processed line-by-line no matter how long it is.

So, your program might look something like this ... (caution:   extemporaneous coding)

use strict; use warnings; # Open the input file, create the output file. open(INFILE, "</temp/PERL-Samples/myfile.txt") || die "File not found" +; open(OUTFILE, ">/temp/PERL-Samples/myfile_out.txt"); while (<INFILE>) { #Looks like the actual logic here is actually just a filter: #1. Finds any instance of || and changes it to | | #2. Finds any instance of | || and changes it to | | | #3. Finds any instance of || | and changes it to | | | #4. Finds any instance of | | at the end of a line and # changes it to || $_ =~ s/\|\|/\| \|/g; $_ =~ s/\| \|\|/\| \|/g; $_ =~ s/\|\| \|/\| \|/g; $_ =~ s/\| \|$/\|\|/g; print OUTFILE $_; } close INFILE; close OUTFILE;

Replies are listed 'Best First'.
Re^2: How do I delete lines matching a certain condition and make sure each line has the right prefix and suffix?
by Laurent_R (Canon) on Sep 17, 2014 at 06:41 UTC
    I also prefer the line by line approach, especially because I am mostly dealing with files having tens or even hundreds of millions lines, and writing to another file is also a must for me, because I may need to undo things if something went wrong. As a side note, there is no need to escape the pipe ("|") character in the replacement values of the s/// operator.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1100866]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (8)
As of 2024-04-24 08:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found