Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: How to check lines that start with the same word then delete one of them

by kcott (Archbishop)
on Apr 10, 2020 at 18:31 UTC ( [id://11115344]=note: print w/replies, xml ) Need Help??


in reply to How to check lines that start with the same word then delete one of them

G'day agnes00,

The following reads through the input file twice: once to gather information; a second time to generate the output. The data that's collected is small; there's no reproduction of the data already in the input file. There are no nested loops. The only regex is the /;/ in the split statement.

You only showed two lines of sample data: this code will exclude the record with the two instances of 'SU LI IR ST' regardless of the order in which it appears. I've added other test data: I've no idea whether that's representative of your real data.

#!/usr/bin/env perl use strict; use warnings; use autodie ':all'; use constant { MATCH => 0, CHECK1 => 2, CHECK2 => 4, FLAG => 'SU LI IR ST', RECORD => 0, EXCLUDE => 1, }; my $backup = 'pm_11115310_lines_BU.txt'; my $temp = 'pm_11115310_lines_TMP.txt'; my $input = 'pm_11115310_lines.txt'; `cp $backup $input`; _print_data($input); my (%seen, %exclude); { open my $fh, '<', $input; while (<$fh>) { chomp; ++$exclude{$.} and next unless length; my @parts = split /;/, $_, 6; my $possible_exclude = $parts[CHECK1] eq FLAG && $parts[CHECK2] eq FLAG; if (exists $seen{$parts[MATCH]}) { if ($possible_exclude) { ++$exclude{$.}; } else { if ($seen{$parts[MATCH]}[EXCLUDE]) { ++$exclude{$seen{$parts[MATCH]}[RECORD]}; $seen{$parts[MATCH]}= [$., 0]; } } } else { $seen{$parts[MATCH]} = [$., $possible_exclude]; } } seek $fh, 0, 0; $. = 0; open my $tmp, '>', $temp; while (<$fh>) { next if $exclude{$.}; print $tmp $_; } } `cp $temp $input`; _print_data($input); sub _print_data { my ($file) = @_; print '-' x 20, " $file ", '-' x 20, "\n"; system cat => $file; print '-' x (42 + length $file), "\n"; }

Output:

-------------------- pm_11115310_lines.txt -------------------- S_FER_SCAM1_ARRESTO;ARRESTO;ST;0;ST;1;0;TS;0;0 S_FER_SCAM1_ARRESTO;ARRESTO;SU LI IR ST;0;SU LI IR ST;1;0;TS;0;0 S_FER_SCAM2_ARRESTO;ARRESTO;ST;0;ST;1;0;TS;0;0 S_FER_SCAM2_ARRESTO;ARRESTO;SU LI IR ST;0;SU LI IR ST;1;0;TS;0;0 S_FER_SCAM2_ARRESTO;ARRESTO;XU LI IR ST;0;SU LI IR ST;1;0;TS;0;0 S_FER_SCAM2_ARRESTO;ARRESTO;SU LI IR ST;0;XU LI IR ST;1;0;TS;0;0 S_FER_SCAM2_ARRESTO;ARRESTO;XU LI IR ST;0;XU LI IR ST;1;0;TS;0;0 S_FER_SCAM3_ARRESTO;ARRESTO;ST;0;ST;1;0;TS;0;0 S_FER_SCAM4_ARRESTO;ARRESTO;SU LI IR ST;0;SU LI IR ST;1;0;TS;0;0 S_FER_SCAM5_ARRESTO;ARRESTO;SU LI IR ST;0;SU LI IR ST;1;0;TS;0;0 S_FER_SCAM5_ARRESTO;ARRESTO;ST;0;ST;1;0;TS;0;0 --------------------------------------------------------------- -------------------- pm_11115310_lines.txt -------------------- S_FER_SCAM1_ARRESTO;ARRESTO;ST;0;ST;1;0;TS;0;0 S_FER_SCAM2_ARRESTO;ARRESTO;ST;0;ST;1;0;TS;0;0 S_FER_SCAM2_ARRESTO;ARRESTO;XU LI IR ST;0;SU LI IR ST;1;0;TS;0;0 S_FER_SCAM2_ARRESTO;ARRESTO;SU LI IR ST;0;XU LI IR ST;1;0;TS;0;0 S_FER_SCAM2_ARRESTO;ARRESTO;XU LI IR ST;0;XU LI IR ST;1;0;TS;0;0 S_FER_SCAM3_ARRESTO;ARRESTO;ST;0;ST;1;0;TS;0;0 S_FER_SCAM4_ARRESTO;ARRESTO;SU LI IR ST;0;SU LI IR ST;1;0;TS;0;0 S_FER_SCAM5_ARRESTO;ARRESTO;ST;0;ST;1;0;TS;0;0 ---------------------------------------------------------------

Do note my use of a backup file. Simply overwriting your input data is not a good move at all!

— Ken

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11115344]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (5)
As of 2024-03-28 20:17 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found