Re^3: How to check lines that start with the same word then delete one of them

Replies are listed 'Best First'.
Re^4: How to check lines that start with the same word then delete one of them by LanX (Saint) on Apr 10, 2020 at 13:26 UTC
Pretty much what I meant, thanks! Minor nitpick, I'd assign the first match to a normal var. Special vars like $1 can get overwritten easily by "more code" before seen is set. Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]
Re^5: How to check lines that start with the same word then delete one of them by hippo (Bishop) on Apr 10, 2020 at 13:38 UTC
Cool - glad we're in agreement. You are quite right about limiting the chances of stomping on $1 and friends, of course. The test script could be polished no doubt but I didn't want to spend/waste time on that before agnes00 confirmed that this does actually solve the problem. The requirements as stated were a little wooly.	[reply]
Re^6: How to check lines that start with the same word then delete one of them by LanX (Saint) on Apr 10, 2020 at 18:35 UTC
> I didn't want to spend/waste time on that ... The requirements as stated were a little wooly. That's why I thanked you for implementing it. :) Cheers Rolf _{(addicted to the Perl Programming Language :) Wikisyntax for the Monastery}	[reply]
Re^4: How to check lines that start with the same word then delete one of them by agnes00 (Novice) on Apr 10, 2020 at 13:01 UTC
Thank you for your answer. The problem is that I've a big file (40000 line), that's why I did'nt bring it to my post. and also I can't specify all the wanted lines, I just want to check if for every line there is another line witch have the same first word then I check it to delete one of them, that's why I did two loops. but the algorithm is so slow	[reply]
Re^5: How to check lines that start with the same word then delete one of them by hippo (Bishop) on Apr 10, 2020 at 13:15 UTC
The problem is that I've a big file (40000 line), that's why I did'nt bring it to my post. Everyone is glad you didn't. "Sample data" means just that. Pick maybe 6 lines - enough to be illustrative and to cover the bases. My script above is a test only. It illustrates that the algorithm and the code works, given the sample data. I just want to check if for every line there is another line witch have the same first word then I check it to delete one of them, This is precisely what my test shows. Would you not agree? To turn the test into a working script just replace `@in` with the code you already have which reads the input data from the file and similarly write `@have` to your file at the end. that's why I did two loops. but the algorithm is so slow Your algorithm is O(n²) whereas mine is O(n). Mine should therefore be thousands of times faster for a 40,000 line dataset. See also: Big O notation, SSCCE and Basic Testing Tutorial. HTH.	[reply] [d/l] [select]
Re^6: How to check lines that start with the same word then delete one of them by rsFalse (Chaplain) on Apr 10, 2020 at 15:13 UTC
I think your solution works in O(nlog n), because searching an item in a hash takes log n. Am I right? Still much much faster that O(nn) :)	[reply]
Re^7: How to check lines that start with the same word then delete one of them by Laurent_R (Canon) on Apr 10, 2020 at 15:42 UTC
Re^4: How to check lines that start with the same word then delete one of them by agnes00 (Novice) on Apr 10, 2020 at 16:56 UTC
I've tested your code with this input : my @in = ( 'S_FER_SCAM1_ARRESTO;ARRESTO;ST;0;ST;1;0;TS;0;0', 'S_VINREU_RLIP1_ALLARMEZONA3;ALLARME ZONA 3;SU LI IR ST;0;SU LI I +R ST;1;0;TS;0;0', 'S_VINREU_RLIP1_ANOMBAT;ANOMALIA BATTERIA;SU LI IR ST;0;SU LI IR S +T;1;0;TS;0;0', 'S_FER_VENT1_ERRCOLINV;ERRORE PROFIBUS COLL INVERTER;SU LI IR ST;0 +;SU LI IR ST;1;0;TS;0;0', 'S_VINREU_RLIP1_CIRCZONE1;CIRCUITO ZONA 1 FUNZONANTE;SU LI IR ST;0 +;SU LI IR ST;1;0;TS;0;0', 'S_VINREU_RLIP1_CIRCZONE2;CIRCUITO ZONA 2 FUNZONANTE;SU LI IR ST;0 +;SU LI IR ST;1;0;TS;0;0', 'S_FER_SCAM1_ARRESTO;ARRESTO;SU LI IR ST;0;SU LI IR ST;1;0;TS; +0;0', 'S_FER_VENT1_ERRCOLINV;ERRORE PROFIBUS COLL INVERTER;ST;0;ST;1;0;T +S;0;0' ); [download] I print `@have` array and it shows 7 lines with `S_FER_VENT1_ERRCOLINV` is duplicated (should show only 6), it hide only one, in my data file I've lines that have the same id ($1) two times and others are normal (no duplicate id)	[reply] [d/l] [select]
Re^5: How to check lines that start with the same word then delete one of them by hippo (Bishop) on Apr 10, 2020 at 22:27 UTC
That's because you have changed the criteria. `'S_FER_VENT1_ERRCOLINV;ERRORE PROFIBUS COLL INVERTER;ST;0;ST;1;0;TS;0;0'` does not match the check in your initial post of `if($var eq $1 and $line2 =~ /(.?);.?;SU LI IR ST;.?;SU LI IR ST;.? +;.?;.?;.?;.?(?:$)/)` [download] ... so it has not been removed. Did you not mean this? Was your initial post misleading?	[reply] [d/l] [select]


Clear questions and runnable code get the best and fastest answer
	PerlMonks