Do you actually have data that has the fields separated by Tab characters as your code suggests? If so are there tabs between each and every gene? Specifying the input file format exactly will determine the solution.

If the input is actually just space characters but you can count on the id you're searching for to be the only field with a colon in it... this works:

my $match = "DOID:2055"; while (<DATA>){ my ($name,$id,$genes) = m/(.*?)\s+(\S+?:\S+?)\s+(.*)/; print "$genes\n" if $id eq $match; } __DATA__ Charcot-Marie-Tooth disease DOID:10595 KIF20A MTMR2 MTM1 LMNA HOXD10 P +RX NEFL EGR2 LITAF GARS NDRG1 ERBB3 HSPB1 EMP2 MPZ ERBB2 PMP22 MFN2 G +JB1 Post-traumatic stress disorder DOID:2055 APOE FKBP5 CRH IL2 SLC6A3 MAO +B DBH IL8