Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: Compare three columns of one file with three columns of another file in perl

by aaron_baugher (Curate)
on May 26, 2015 at 01:50 UTC ( [id://1127737]=note: print w/replies, xml ) Need Help??


in reply to Compare three columns of one file with three columns of another file in perl

Your requirements aren't clear to me. How do you want to compare the columns? Do all three have to match, and do they have to match respectively, or can they match in any order? (In other words, can "a, b, c" match "b, c, a"?) Or is it okay if only one matches? Can any line from one file match any line in the other file (I think this is your intention)? If so, what should it do when it finds a match? What if a line in one file matches multiple lines in the other file? Is that possible, and if so, what should be done?

Work out what your requirements actually are and explain them as clearly as you can, preferably with some sample input and output data, and it will be easier for people to help you.

Aaron B.
Available for small or large Perl jobs and *nix system administration; see my home node.

  • Comment on Re: Compare three columns of one file with three columns of another file in perl

Replies are listed 'Best First'.
Re^2: Compare three columns of one file with three columns of another file in perl
by anonym (Acolyte) on May 26, 2015 at 02:45 UTC

    Hi Aaron, Thanks. They have to match in order.Like all three chr,start,end of one file should match the chr,start,end of second file.

      Aaron File infile few lines are chr10 40095550 40096075 chr10 40102275 40102575

      second infile few lines are chr1 mm10_knownGene exon 3205904 3207317 0.000000 - . gene_id "uc007aet.1"; transcript_id "uc007aet.1"; chr1 mm10_knownGene exon 3213439 3215632 0.000000 - . gene_id "uc007aet.1"; transcript_id "uc007aet.1";

      output should be for matching chr,start,end of first file with that of second file.

        If you edit your post and put <code> tags around your data, as you did with your code in your first post in this thread, we'll be able to read it.

        Okay, it sounds like you have a fairly typical "match lines from fileA to lines in fileB" problem, with the added feature of needing to match three fields instead of just one. So the standard solution is to go through one file (usually the smaller one to keep memory use lower), creating a hash of keys with the important fields from that file, then go through the other file line-by-line, checking each line to see if its fields are found in the hash. In pseudo-code:

        create a %hash open fileA while get a line from fileA parse out the three important fields from the line concat those fields into a single string put that string in the hash as a key, with the line as its value close fileA open fileB while get a line from fileB parse out the three important fields from this line concat those fields into a single string if that string exists as a key in the hash print out this line and the value of the matching hash key close fileB

        One caveat: this only works if the keys created from fileA are unique. If it's possible for multiple lines to have the same key (the same first three column values), you'll have to get a bit more complicated.

        Aaron B.
        Available for small or large Perl jobs and *nix system administration; see my home node.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1127737]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (3)
As of 2024-04-19 21:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found