Dear monks,
i have a list of lines in a file. Each line is compared with the other line. If one line has a substring of the other line, the shortest of the two is deleted. Only the longest line is stored.
input file:
mylist_1 sublist_153 sublist_87 sublist_876 sublist_78
mylist_6 sublist_8
mylist_2 sublist_12 sublist_34 sublist_09
mylist_3 sublist_87 sublist_09
mylist_7 sublist_8 sublist_9
mylist_9 sublist_56
in the above example, line 2 is a substring of line5. since line5 is l
+onger than line2, only only line5 is taken for results. another examp
+le is line4 is a substring of line3, since line3 is longer, i take on
+ly that for results.
another example:
apple orange cake juice
apple fruits
car van bus jeep
sumo hat
people van car
in the above example, apple in found in 2 lines, but i keep only the l
+ine which has many elements compared the other. car is found in last
+line and 3rd line. but i take only the 3rd line as hit because it has
+ many elements compared to last line.
so my result would be:
apple orange cake juice
car van bus jeep
sumo hat
my program:
#!/usr/bin/perl
open(FH,"input_file.txt") or die "can not open input file";
while($line=<FH>){
@collect=split(/\s+/,$line);
push(@aoa,join("#",@collect));
}
my %h;
for(@aoa){
push(@uaoa,$_ if !$h{join $;, @$_}++;
}
foreach(@uaoa){
print "$_\n";
}
the desired output for this problem:
mylist_1 sublist_153 sublist_87 sublist_876 sublist_78
mylist_7 sublist_8 sublist_9
mylist_2 sublist_12 sublist_34 sublist_09
mylist_9 sublist_56
please help :(
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.
|