Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^6: First foray into Perl

by AnomalousMonk (Archbishop)
on Mar 25, 2014 at 20:35 UTC ( [id://1079734]=note: print w/replies, xml ) Need Help??


in reply to Re^5: First foray into Perl
in thread First foray into Perl

Another Note: I just noticed that the latest example data here end with the following three lines:

TF Unknown TF Name Unknown Gene ENSG00000113916
These look like the start of another record and screw up parsing. Is this the intended and necessary ending of a full set of data records? It's a problem if so. (The absence of an unambiguous multi-line record delimiter is another problem, but can be finessed if necessary.)

Also: Any word yet on the tab-versus-spaces field delimiter question posted above?

Replies are listed 'Best First'.
Re^7: First foray into Perl
by LostWeekender (Novice) on Mar 25, 2014 at 21:48 UTC

    Hi, thank you very much for looking at this.

    The spacing change was inadvertent, not quite sure what happened there.

    The last three lines are the start of another record and can be ignored/removed. I do have another version of the record file with records delimited by a double space, thusly:

    TF Unknown TF Name Unknown Gene ENSG00000113916 Motif ENSG00000113916___1|4x3 Family C2H2 ZF Species Homo_sapiens Pos A C G T 1 0.427379 0.0647991 0.288826 0.218996 2 0.201974 0.139791 0.35254 0.305695 3 0.11714 0.118042 0.143884 0.620934 4 0.637331 0.0996546 0.228428 0.0345867 5 0.0971289 0.591289 0.134781 0.176801 6 0.0715039 0.0237142 0.0432674 0.861514 7 0.73769 0.117011 0.059703 0.0855963 8 0.0728444 0.00877167 0.877166 0.0412175 9 0.959269 0.0131077 0.0159611 0.0116621 10 0.612865 0.057845 0.0583267 0.270963 TF Unknown TF Name Unknown Gene ENSG00000161940 Motif ENSG00000161940___1|1x3 Family C2H2 ZF Species Homo_sapiens Pos A C G T 1 0.614704 0.122914 0.125116 0.137266 2 0.0954267 0.010422 0.851317 0.0428343 3 0.959146 0.00959146 0.0112618 0.0200008 4 0.91149 0.0146678 0.0135794 0.0602625 5 0.67464 0.0388388 0.13716 0.149361 6 0.104655 0.0579394 0.804166 0.0332392 7 0.789171 0.102902 0.0490883 0.0588389 8 0.776513 0.0273768 0.144501 0.0516094 9 0.130657 0.06051 0.0793659 0.729467 10 0.626753 0.0648533 0.143976 0.164418

    Thanks again!

      Here's a version for double-newline record separators, still no tabs, only 1+ spaces separate fields. Included new  ENSG00000113916___1|4x3 motif in test data. Most of notes and caveats of Re: First foray into Perl still apply.

        Wow - Thanks so much for your efforts on this. Really appreciated! I'm still working on using it to extract sequences from my 15,000 record file so can't state success quite yet but I'll let you know.

        Cheers!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1079734]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (7)
As of 2024-04-16 17:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found