Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^2: Search and delete lines based on string matching

by brut (Initiate)
on Mar 13, 2007 at 14:35 UTC ( [id://604544]=note: print w/replies, xml ) Need Help??


in reply to Re: Search and delete lines based on string matching
in thread Search and delete lines based on string matching

Just adding to the above problem..Suppose in the same code I have to add instead of deleting the strings matched.. Like strings that are in A but not in B and I want to add them using some string like "This is new addition" before adding those strings to file B and writing out in C.
  • Comment on Re^2: Search and delete lines based on string matching

Replies are listed 'Best First'.
Re^3: Search and delete lines based on string matching
by imp (Priest) on Mar 13, 2007 at 15:11 UTC
    Here's an approach that initializes a hash of output tokens, prepopulated with marker text for new additions. The output data is overridden for existing entries, and a sorted list is appended to the specified output file (open with '>' instead of '>>' if you don't want this). Note that it isn't fully compatible with the delete code I posted earlier, since that code didn't take comments into account.
    #!/usr/local/bin/perl use strict; use warnings; if (@ARGV != 3) { print "Usage: $0 <pattern file> <input file> <output file>\n"; exit; } my ($pattern_filename, $source_filename, $dest_filename) = @ARGV; open my $pattern_fh, '<', $pattern_filename or die "Failed to open $pa +ttern_filename: $!"; my %output_tokens = (); while (my $line = <$pattern_fh>) { chomp $line; $output_tokens{$line} = "$line # Added by script"; } print "Expected tokens: ", join(', ', keys %output_tokens), "\n"; open my $infile, "<", $source_filename or die "Failed to open $source +_filename: $!"; open my $outfile,">>", $dest_filename or die "Failed to open $dest_f +ilename: $!"; while(my $line = <$infile>) { chomp $line; $output_tokens{$line} = $line; } for my $token (sort keys %output_tokens) { print $output_tokens{$token}, "\n"; } close($infile); close($outfile);
      Hey The very first code you provided me with is really working fine for no +rmal strings without any non-alphabet characters except for its not w +orking for strings which are like A[0] B[1] So please resolve this .. I tried using the fix you gave but its not w +orking.So need to modify the code for these cases of strings also. Rest everything is really fine. And real thanks for all the innovative solutions you have provided.You + really are excellent.
        It seems to work for me, unless I misunderstood your specs. Here are the files I am using:
        patterns.txt
        A[0] C D
        infile.txt
        A[0] B C D[0] D1 DA
        brut.pl
        #!/usr/local/bin/perl use strict; use warnings; if (@ARGV != 3) { print "Usage: $0 <pattern file> <input file> <output file>\n"; exit; } my ($pattern_filename, $source_filename, $dest_filename) = @ARGV; open my $pattern_fh, '<', $pattern_filename or die "Failed to open $pa +ttern_filename: $!"; my @tokens = (); while (my $line = <$pattern_fh>) { chomp $line; push @tokens, quotemeta($line); } my $pattern = '^(?:' . join('|', @tokens) . ')[^a-zA-Z]*$'; print "Search pattern: $pattern\n"; open my $infile, "<", $source_filename or die "Failed to open $source +_filename: $!"; open my $outfile,">>", $dest_filename or die "Failed to open $dest_f +ilename: $!"; while(my $line = <$infile>) { print "input : $line"; if ($line =~ /$pattern/) { next; } print "output: $line"; print $outfile $line; } close($infile); close($outfile);
        perl brut.pl patterns.txt infile.txt outfile.txt
        Search pattern: ^(?:A\[0\]|C|D)[^a-zA-Z]*$ input : A[0] input : B output: B input : C input : D[0] input : D1 input : DA output: DA
Re^3: Search and delete lines based on string matching
by ww (Archbishop) on Mar 13, 2007 at 15:07 UTC
    As suggested above, What code have you tried?
Re^3: Search and delete lines based on string matching
by imp (Priest) on Mar 13, 2007 at 14:43 UTC
    How is your input data formatted?
    1. bin den mig
    2. bin
      den
      mig
    3. bin deg
      mig
      Its as in option 2..that is new line character after each string. I am facing a problem in the code you provided that it is not able to delete strings like bin[0] , bin 12 and bin234. Can you please help on this also.
        Ah, you specified that words had to be removed, not tokens that could be part of a word. For the token 'bin' which should be removed:
        1. foobin
        2. binary
        3. bin1
        If all of those should be deleted then you can change that pattern from:
        my $pattern = '\b(?:' . join('|', @tokens) . ')\b';
        To:
        my $pattern = join('|', @tokens);
        If you only want to match words that start with 'bin', and are followed only by non-alpha characters, then this:
        my $pattern = '^(?:' . join('|', @tokens) . ')[^a-zA-Z]*$';
        A revised copy that handles the deletion of tokens with a purely line based input:
        #!/usr/local/bin/perl use strict; use warnings; if (@ARGV != 3) { print "Usage: $0 <pattern file> <input file> <output file>\n"; exit; } my ($pattern_filename, $source_filename, $dest_filename) = @ARGV; open my $pattern_fh, '<', $pattern_filename or die "Failed to open $pa +ttern_filename: $!"; my @tokens = (); while (my $line = <$pattern_fh>) { chomp $line; push @tokens, $line; } my $pattern = '^(?:' . join('|', @tokens) . ')[^a-zA-Z]*$'; print "Search pattern: $pattern\n"; open my $infile, "<", $source_filename or die "Failed to open $source +_filename: $!"; open my $outfile,">>", $dest_filename or die "Failed to open $dest_f +ilename: $!"; while(my $line = <$infile>) { print "input : $line"; if ($line =~ /$pattern/) { next; } print "output: $line"; print $outfile $line; } close($infile); close($outfile);
        A reply falls below the community's threshold of quality. You may see it by logging in.
      Its as in option 2 that is new line after every string in both A and B. Also in your code the strings like bin\5\ , bin \43\ and bin\123\ (like array elements with element number in square brackets)are not getting deleted from B. Can you help on this?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://604544]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others scrutinizing the Monastery: (5)
As of 2024-04-23 16:18 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found