http://qs321.pair.com?node_id=1106348


in reply to Print entire line

Hi,

Do you actually have data that has the fields separated by Tab characters as your code suggests? If so are there tabs between each and every gene? Specifying the input file format exactly will determine the solution.

If the input is actually just space characters but you can count on the id you're searching for to be the only field with a colon in it... this works:

my $match = "DOID:2055"; while (<DATA>){ my ($name,$id,$genes) = m/(.*?)\s+(\S+?:\S+?)\s+(.*)/; print "$genes\n" if $id eq $match; } __DATA__ Charcot-Marie-Tooth disease DOID:10595 KIF20A MTMR2 MTM1 LMNA HOXD10 P +RX NEFL EGR2 LITAF GARS NDRG1 ERBB3 HSPB1 EMP2 MPZ ERBB2 PMP22 MFN2 G +JB1 Post-traumatic stress disorder DOID:2055 APOE FKBP5 CRH IL2 SLC6A3 MAO +B DBH IL8

Replies are listed 'Best First'.
Re^2: Print entire line
by pabla23 (Novice) on Nov 06, 2014 at 11:27 UTC
    Ok, there is a tab between post-trau/DOID/APOE/FKBP5, they are on the same line

    post-traumatic stress disorder DOID:2055 APOE FKBP5 CRH IL2 SLC6A3 MAOB DBH IL8

    My input is "DOID:2055" and my output should be:

    APOE

    FKBP5

    CRH

    IL2

    and the other genes. Sorry for my explanation now is clear? Thanks Paola

      ok,

      my $filename = '/Users/Pabli/Desktop/do_human_mapping.gmt'; my $match = 'DOID:2055'; open(my $file, '<', $filename) or die "open: $!"; while (<$file>){ my ($name,$id,@genes) = split /\t/; print join("\n",@genes) if $id eq $match; }

      The answer to your question then, is to use the assignment idiom above, to name the first two fields, and then use an array to slurp up all the genes that follow on the line. Because the name and id never get lumped into the @genes array, you don't have to go through contortions when it comes time to print.

        Thanks so so so much!!! It works!!!!!!

        Paola

        Sorry and if i want for example to search all the genes associated to different "DOID". My file is:

        parasitic helminthiasis infectious disease DOID:883 IL4 IL5

        female reproductive organ cancer DOID:120 BARD1 MAN1B1 SLC12A7 AMHR2 IL4 SLC12A6 SLC12A4

        My input is "IL4" and i want:

        DOID:883

        DOID:120

        I have to compare string? Thank a lot Paola