Search and delete lines based on string matching

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Search and delete lines based on string matching by imp (Priest) on Mar 13, 2007 at 14:20 UTC
In addition to davorg's advice above you should also always use both strict and warnings, as they can help you identify many common problems. If you are searching for the words from file A in file B then you will need a different regex. The code you provided is using the entire file A as the regex. Here's an example that uses one pattern file, one input file, one output file: use strict; use warnings; if (@ARGV != 3) { print "Usage: $0 <pattern file> <input file> <output file>\n"; exit; } my ($pattern_filename, $source_filename, $dest_filename) = @ARGV; open my $pattern_fh, '<', $pattern_filename or die "Failed to open $pa +ttern_filename: $!"; my @tokens = (); while (my $line = <$pattern_fh>) { push @tokens, split /\s/, $line; } # Create a pattern with alternation of tokens, wrapped in a non-captur +ing group, # and a requires word break before and after the word to prevent match +ing pieces # of other words my $pattern = '\b(?:' . join('\|', @tokens) . ')\b'; print "Search pattern: $pattern\n"; open my $infile, "<", $source_filename or die "Failed to open $source +_filename: $!"; open my $outfile,">>", $dest_filename or die "Failed to open $dest_f +ilename: $!"; while(my $line = <$infile>) { if ($line !~/$pattern/) { print "adding: $line"; print $outfile $line; } } close($infile); close($outfile); [download]	[reply] [d/l]
Re^2: Search and delete lines based on string matching by brut (Initiate) on Mar 13, 2007 at 14:30 UTC
Thanks a lot buddy... It did helped..	[reply]
Re^2: Search and delete lines based on string matching by brut (Initiate) on Mar 13, 2007 at 14:35 UTC
Just adding to the above problem..Suppose in the same code I have to add instead of deleting the strings matched.. Like strings that are in A but not in B and I want to add them using some string like "This is new addition" before adding those strings to file B and writing out in C.	[reply]
Re^3: Search and delete lines based on string matching by imp (Priest) on Mar 13, 2007 at 15:11 UTC
Here's an approach that initializes a hash of output tokens, prepopulated with marker text for new additions. The output data is overridden for existing entries, and a sorted list is appended to the specified output file (open with '>' instead of '>>' if you don't want this). Note that it isn't fully compatible with the delete code I posted earlier, since that code didn't take comments into account. #!/usr/local/bin/perl use strict; use warnings; if (@ARGV != 3) { print "Usage: $0 <pattern file> <input file> <output file>\n"; exit; } my ($pattern_filename, $source_filename, $dest_filename) = @ARGV; open my $pattern_fh, '<', $pattern_filename or die "Failed to open $pa +ttern_filename: $!"; my %output_tokens = (); while (my $line = <$pattern_fh>) { chomp $line; $output_tokens{$line} = "$line # Added by script"; } print "Expected tokens: ", join(', ', keys %output_tokens), "\n"; open my $infile, "<", $source_filename or die "Failed to open $source +_filename: $!"; open my $outfile,">>", $dest_filename or die "Failed to open $dest_f +ilename: $!"; while(my $line = <$infile>) { chomp $line; $output_tokens{$line} = $line; } for my $token (sort keys %output_tokens) { print $output_tokens{$token}, "\n"; } close($infile); close($outfile); [download]	[reply] [d/l]
Re^4: Search and delete lines based on string matching by brut (Initiate) on Mar 13, 2007 at 15:26 UTC
Re^5: Search and delete lines based on string matching by imp (Priest) on Mar 13, 2007 at 15:33 UTC
Some notes below your chosen depth have not been shown here
Re^3: Search and delete lines based on string matching by ww (Archbishop) on Mar 13, 2007 at 15:07 UTC
As suggested above, What code have you tried?	[reply]
Re^3: Search and delete lines based on string matching by imp (Priest) on Mar 13, 2007 at 14:43 UTC
How is your input data formatted? bin den mig bin den mig bin deg mig	[reply]
Re^4: Search and delete lines based on string matching by brut (Initiate) on Mar 13, 2007 at 14:46 UTC
Re^5: Search and delete lines based on string matching by imp (Priest) on Mar 13, 2007 at 14:57 UTC
Some notes below your chosen depth have not been shown here
Re^4: Search and delete lines based on string matching by brut (Initiate) on Mar 13, 2007 at 14:53 UTC
Re: Search and delete lines based on string matching by davorg (Chancellor) on Mar 13, 2007 at 13:54 UTC
It's really difficult to help you as your code is pretty much unreadable. You should edit your node to be <code> tags around your source code. It's also a bad idea to say "But its giving me error" without telling us what the error says. I assume that the words you are trying to filter end up in @lines (not the best name for that variable!), but it looks to me as tho' all of the elements in that array will still have newline characters on the end - which makes it harder for them to match other text. But, like I say, it's hard to be sure what the problem is until you tidy up the node and give us some better information. -- <http://dave.org.uk> "The first rule of Perl club is you do not talk about Perl club." -- Chip Salzenberg	[reply]
Re: Search and delete lines based on string matching by ptum (Priest) on Mar 13, 2007 at 14:00 UTC
Since you posted as Anonymonk, you can't go back and edit your post, but next time, please use <code> tags. What error are you seeing? You didn't tell us. Is this a homework problem? We don't mind helping, but we're not particularly inclined to do your homework. To solve a problem like this, I would generally read in the contents of file A into a hash, since you just want to use those words as a lookup. Then I would open files B and C, step through the contents of file B a line at a time, and, whenever the line of B contains a word in my hash, drop it on the floor -- otherwise, write that line to file C. I don't think that opening the file handles inside your loop is a good idea. You're not really clear as to whether file B contains single words or longer strings -- if longer strings, then you might want to split the line into individual tokens (which can then be individually compared to your hash from file A) or (if the number of words in file A is small enough) you may prefer to build a regular expression by which you evaluate each string. A little more detail might help us to help you more effectively.	[reply]
A reply falls below the community's threshold of quality. You may see it by logging in.
Re: Search and delete lines based on string matching by jdporter (Paladin) on Mar 13, 2007 at 15:39 UTC
Sounds like you're trying to reimplement `fgrep -v -F`. Here's a quick-and-dirty: `use Getopt::Long; GetOptions( 'file=s' => \my $patfile ); chomp( my @del = do { local @ARGV = ($patfile); <> } ); my %del; @del{ @del } = (); $, = $\ = $/; print grep { chomp; not exists $del{$_} } <>;` [download] call it like so: `perl this_script.pl -f A < B > C` [download] (given A, B, C, per your root post) A word spoken in Mind will reach its own level, in the objective world, by its own weight	[reply] [d/l] [select]


Perl Monk, Perl Meditation
	PerlMonks