Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re^5: compare lines within a file

by garyboyd (Acolyte)
on Mar 11, 2011 at 13:56 UTC ( [id://892656]=note: print w/replies, xml ) Need Help??


in reply to Re^4: compare lines within a file
in thread compare lines within a file

I tried the code but I get an error: Use of uninitialized value in printf at parse_result.txt.pl line 23, <$fh> line 45.

If I change the code '\t' in place of "\t" I still get the error

The input file is:

HWUSI-EAS95L_0025_FC:3:1:2031:1075#0/1 + 2770970 2771005 HWUSI-EAS95L_0025_FC:3:1:2031:1075#0/2 + 2771158 2771190 HWUSI-EAS95L_0025_FC:3:1:2229:1075#0/1 - 1449587 1449620 HWUSI-EAS95L_0025_FC:3:1:2229:1075#0/2 - 1449425 1449460 HWUSI-EAS95L_0025_FC:3:1:5001:1079#0/1 - 1449311 1449346 HWUSI-EAS95L_0025_FC:3:1:5001:1079#0/2 - 1449301 1449336 HWUSI-EAS95L_0025_FC:3:1:5232:1082#0/1 - 1449586 1449619 HWUSI-EAS95L_0025_FC:3:1:5232:1082#0/2 - 1449544 1449577 HWUSI-EAS95L_0025_FC:3:1:6417:1078#0/1 - 4744083 4744113 HWUSI-EAS95L_0025_FC:3:1:6417:1078#0/2 - 4744011 4744042 HWUSI-EAS95L_0025_FC:3:1:6539:1083#0/1 - 4867122 4867157 HWUSI-EAS95L_0025_FC:3:1:6539:1083#0/2 - 4866942 4866977 HWUSI-EAS95L_0025_FC:3:1:10260:1083#0/1 + 1930232 1930266 HWUSI-EAS95L_0025_FC:3:1:10260:1083#0/2 + 1930354 1930389 HWUSI-EAS95L_0025_FC:3:1:10916:1076#0/1 - 4874098 4874133 HWUSI-EAS95L_0025_FC:3:1:10916:1076#0/2 - 4874089 4874121 HWUSI-EAS95L_0025_FC:3:1:11022:1076#0/1 + 749842 749877 HWUSI-EAS95L_0025_FC:3:1:11022:1076#0/2 + 749905 749936 HWUSI-EAS95L_0025_FC:3:1:11305:1077#0/1 + 2083459 2083494 HWUSI-EAS95L_0025_FC:3:1:11305:1077#0/2 + 2083661 2083696 HWUSI-EAS95L_0025_FC:3:1:11824:1080#0/1 + 1930341 1930376 HWUSI-EAS95L_0025_FC:3:1:11824:1080#0/2 + 1930373 1930408 HWUSI-EAS95L_0025_FC:3:1:12409:1075#0/1 - 4359407 4359442 HWUSI-EAS95L_0025_FC:3:1:12409:1075#0/2 - 4359384 4359419 HWUSI-EAS95L_0025_FC:3:1:15014:1078#0/1 + 742090 742125 HWUSI-EAS95L_0025_FC:3:1:15014:1078#0/2 + 742134 742168 HWUSI-EAS95L_0025_FC:3:1:15074:1080#0/1 - 2697450 2697485 HWUSI-EAS95L_0025_FC:3:1:15074:1080#0/2 - 2697347 2697381 HWUSI-EAS95L_0025_FC:3:1:15895:1077#0/1 - 3870810 3870845 HWUSI-EAS95L_0025_FC:3:1:15895:1077#0/2 - 3870798 3870832 HWUSI-EAS95L_0025_FC:3:1:16241:1078#0/1 + 3726316 3726351 HWUSI-EAS95L_0025_FC:3:1:16241:1078#0/2 + 3726444 3726479 HWUSI-EAS95L_0025_FC:3:1:16990:1084#0/1 + 4485745 4485780 HWUSI-EAS95L_0025_FC:3:1:16990:1084#0/2 + 4485764 4485797 HWUSI-EAS95L_0025_FC:3:1:1360:1089#0/1 - 4848206 4848241 HWUSI-EAS95L_0025_FC:3:1:2719:1087#0/1 - 1449535 1449570 HWUSI-EAS95L_0025_FC:3:1:2719:1087#0/2 - 1449425 1449460 HWUSI-EAS95L_0025_FC:3:1:2763:1085#0/1 - 1449423 1449458 HWUSI-EAS95L_0025_FC:3:1:2763:1085#0/2 - 1449427 1449460 HWUSI-EAS95L_0025_FC:3:1:3151:1099#0/1 - 4867745 4867773 HWUSI-EAS95L_0025_FC:3:1:3151:1099#0/2 - 4867750 4867774 HWUSI-EAS95L_0025_FC:3:1:4137:1088#0/1 - 4359723 4359758 HWUSI-EAS95L_0025_FC:3:1:4137:1088#0/2 - 4359622 4359657 HWUSI-EAS95L_0025_FC:3:1:4196:1093#0/1 + 2145336 2145371 HWUSI-EAS95L_0025_FC:3:1:4196:1093#0/2 + 2145456 2145490

I was hoping to get the output something like 2770970 and 2771190 for the first 2 entries......etc

Hope that makes sense!

Replies are listed 'Best First'.
Re^6: compare lines within a file
by kennethk (Abbot) on Mar 11, 2011 at 23:27 UTC
    The line

        if ($row->[0] =~ m{\QHWUSI-EAS95L_0025_FC:3:1:5232:1082#0//E}) {

    should have read

        if ($row->[0] =~ m{\QHWUSI-EAS95L_0025_FC:3:1:5232:1082#0/\E}) {

    There is a typo in the original code with an incorrect slash before the trailing E. The escaped pair \Q and \E tells Perl (when interpolating) to escape all characters that have special meaning in a regular expression - see Quote and Quote like Operators in perlop. The typo was of course mine, and I have corrected the original post accordingly. With that change, I get the output: 1449586    1449577. Obviously, you should be modifying that matching condition to fit your requirements.

      Thanks kennethk, this works, but I cannot figure out how to incorporate this into a program that can take a list and then check each line of the list to see if subsequent entries compare and if they do to then print out.

      It is possible to do this if the name is specified in the code

      if ($row->[0] =~ m{\QHWUSI-EAS95L_0025_FC:3:1:5232:1082#0/\E}) {

      but how to do this if you do not know what each line is??

      I tried using the following regex, which will match each line:

               if ($header2 =~  m{HWUSI-EAS95L_0025_FC:3:1:\d*:\d*#0/}) {

      but not sure how to take it from there.....

        My general approach in this type of situation is to build a hash, or more specifically a hash of lists. I am not familiar with your particular input, and figuring out the patterns and how to parse them is usually where most of the effort in writing a script of this type falls. Taking the input from your original post:

        HWUSI-EAS95L_0025_FC:3:1:5232:1082#0/1 - 1449586 1449619 HWUSI-EAS95L_0025_FC:3:1:5232:1082#0/2 - 1449544 1449577 HWUSI-EAS95L_0025_FC:3:1:6417:1078#0/1 - 4744083 4744113 HWUSI-EAS95L_0025_FC:3:1:6539:1083#0/1 - 4867122 4867157 HWUSI-EAS95L_0025_FC:3:1:6539:1083#0/2 - 4866942 4866977 HWUSI-EAS95L_0025_FC:3:1:10260:1083#0/1 + 1930232 1930266 HWUSI-EAS95L_0025_FC:3:1:10260:1083#0/2 + 1930354 1930389

        And fitting the most general pattern that seems to match, I would use code like

        #!/usr/bin/perl use strict; use warnings; use Text::CSV; my @result; my $csv = Text::CSV->new ( { sep_char => "\t" } ) # should set binary + attribute. or die "Cannot use CSV: ".Text::CSV->error_diag (); my %result; while ( my $row = $csv->getline( *DATA ) ) { my ($key, $index) = $row->[0] =~ m{^(.+)/([12])$} or die "Line did not match pattern: @$row"; if ($index == 1) { $result{$key}[0] = $row->[2]; } elsif ($index == 2) { $result{$key}[1] = $row->[2]; } else { die "Index was not 1 or 2: @$row" } } $csv->eof or $csv->error_diag(); # Output results: for my $key (keys %result) { next unless $result{$key}[1]; print "$key:\t$result{$key}[0]\t$result{$key}[1]\n"; } __DATA__ HWUSI-EAS95L_0025_FC:3:1:5232:1082#0/1 - 1449586 1449619 HWUSI-EAS95L_0025_FC:3:1:5232:1082#0/2 - 1449544 1449577 HWUSI-EAS95L_0025_FC:3:1:6417:1078#0/1 - 4744083 4744113 HWUSI-EAS95L_0025_FC:3:1:6539:1083#0/1 - 4867122 4867157 HWUSI-EAS95L_0025_FC:3:1:6539:1083#0/2 - 4866942 4866977 HWUSI-EAS95L_0025_FC:3:1:10260:1083#0/1 + 1930232 1930266 HWUSI-EAS95L_0025_FC:3:1:10260:1083#0/2 + 1930354 1930389
        to get the output

        HWUSI-EAS95L_0025_FC:3:1:6539:1083#0: 4867122 4866942 HWUSI-EAS95L_0025_FC:3:1:5232:1082#0: 1449586 1449544 HWUSI-EAS95L_0025_FC:3:1:10260:1083#0: 1930232 1930354

        Note that because of how tabs get mangled on this site, you'll need to click the download link in order to get the proper formatting in your clipboard.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://892656]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2024-04-19 20:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found