Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number
 
PerlMonks  

Re: Extracting string and numbers from a file

by Athanasius (Archbishop)
on Apr 29, 2020 at 13:48 UTC ( [id://11116211]=note: print w/replies, xml ) Need Help??


in reply to Extracting string and numbers from a file

Hello shabird,

Your regex says: match one or more word characters, followed by one or more non-word characters, followed immediately by a newline; and return the characters matched minus the newline. This won’t work.

What you need is a way to uniquely identify the IDs. From the file contents shown, it looks as though each ID is immediately preceded by a > character. If so, you could use something like this:

#! perl use strict; use warnings; use Data::Dump; my @matches; push @matches, mysub($_) for <DATA>; dd \@matches; sub mysub { return shift =~ / > (\S+) \s /gx; } __DATA__ >NM_030643.4 Homo sapiens apolipoprotein L4 (APOL4) GAGGTGCTGGGGAGCAGCGTGTTTGCTGTGCTTGATTGTGAGCTGCTGGGAAGTTGTGACTTTCATTTTA CCTTTCGAATTCCTGGGTATATCTTGGGGGCTGGAGGACGTGTCTGGTTATTATATAGGTGCACAGCTGG >NM_001198855.1 Homo sapiens cytochrome P450 family 2 subfamily C memb +er 8 (CYP2C8) ACATGTCAAAGAGACACACAC >NR_029834.1 Homo sapiens microRNA 200a (MIR200A), microRNA CCGGGCCCCTGTGAGCATC >AC067940.1 Homo sapiens clone RP11-818E9, LOW-PASS SEQUENCE SAMPLING AAATACAACTTTAAATCAAAACGGTAAAAATTCCACTCTTTCATACTAACTTCAAAAGTATTTGCTTTAA AAAAAAAGNNNNNNNNN

Output:

23:46 >perl 2038_SoPW.pl ["NM_030643.4", "NM_001198855.1", "NR_029834.1", "AC067940.1"] 23:46 >

Hope that helps,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Replies are listed 'Best First'.
Re^2: Extracting string and numbers from a file
by shabird (Sexton) on Apr 29, 2020 at 17:27 UTC

    Works pretty fine.. Thank you!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11116211]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (5)
As of 2024-04-19 10:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found