Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

How to convert the NCBI Gene ID to GenBank ID?

by supriyoch_2008 (Monk)
on Jun 22, 2018 at 06:50 UTC ( [id://1217162]=perlquestion: print w/replies, xml ) Need Help??

supriyoch_2008 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perl Monks,

I am interested in converting the Gene ID of NCBI to GenBank ID. In NCBI Gene database, when I write 7157 as Gene ID in search box, the page opens with the heading "TP53 tumor protein p53 Homo sapiens (human)". Almost at the bottom of that page the sub-heading appears as "mRNA and Protein(s)" which shows the GenBank ID as "NM_000546.5" (first entry) with a hyperlink. When clicked, the GenBank page opens up and shows the details. This is a cumbersome process when one has to get the GenBank ID of many genes. I searched in the web for a perl script which can convert Gene ID to GenBank ID using internet directly. But I did not get such a script. However, the link http://biodb.jp/ can perform this task of conversion in a very lengthy procedure. Then, I tried to get the sequence of Gene ID 7157 using a script:

Here goes the script for sequence:

#!/usr/bin/perl use warnings; use strict; use Bio::DB::GenBank; use Bio::SeqIO; use Text::Wrap; my $gb= new Bio::DB::GenBank; my $id='7157'; my $seq = $gb->get_Seq_by_gi($id); print "\n seq: $seq\n"; exit;

But I got the wrong result and not the sequence in cmd as follows: Here goes the result in cmd:

C:\Users\x>cd d* C:\Users\x\Desktop>g2.pl seq: Bio::Seq::RichSeq=HASH(0x780b234) C:\Users\x\Desktop>

I need suggestions from PerlMonks to solve this problem of ID conversion so that I can get the results of Gene IDs: 7157, 7422 as follows in cmd:

I expect results in the following format:

GenBank ID NM_000546.5 NM_001025366.2

Replies are listed 'Best First'.
Re: How to convert the NCBI Gene ID to GenBank ID?
by bliako (Monsignor) on Jun 22, 2018 at 15:37 UTC

    As per Bio::DB::GenBank's documentation the function you are calling get_Seq_by_gi() returns a Bio::Seq object. Whose documentation is here Bio::Seq. In there it shows how to print the sequence, for example, using a method like : print $seq->seq()."\n";

    That said, be warned that in your script you hammer in your '7157' ID to a remote service which expects GenBank's IDs (NM_000546.5). You do get a response which is garbage unfortunately as it relates to "Dictyostelium discoideum (Slime mold)" and not to "Homo Sapiens" - what's the difference one can ask (sidenote: more evidence that the old GIGO effect is deep-rooted into the heart of the lite "sciences").

    In order to get the response you want you must supply the get_Seq_by_id() with this $id='NM_000546.5'. Then it remains to explore the documentation in order to extract what you want from that large (>100KB) dataset you just transfered from 4000km away across land and sea and possibly through aether.

    If your main aim is to convert programmatically the ID 7157 to an ID understood by GenBank, e.g. NM_000546.5 then welcome to the club of gene id conversions. Probably a tenth of the net's transfers are to sites claiming to convert between the numerous ID standards imposed by bio-narcissi and fund-whores and desperate users who somewhere got lost in these standards or found that the mapping is not 1-1. I can not help you with that although, additionally to Bioperl, R (bioconductor) may offer you another lifeline.

      Thanks much bliako for mentioning the method to get the sequence from Bio::Seq. That thought occurred to be much later that the module may already have a(some) method(s) to do that; advice about Data::Dumper may turn out to be of no use.

        Glad. If you hear of a method to do what you want then let me know and I can assist you with the low-level details if need be. I would like to put all these converters in one place one day.

        btw the returned object is of type Bio::Seq::RichSeq and not what I initially said, Bio:Seq. It is a superset of it, so-to-speak. I was citing the doc and overlooked the returned value you posted.

        Anonymous Monk,

        Thanks for your comments.

        With regards,

      Hi Bliako,

      Thank you very much for your valuable comments and suggestions.

      I am sorry for late reply as the internet connectivity was not available this morning.

      With regards,

Re: How to convert the NCBI Gene ID to GenBank ID?
by Anonymous Monk on Jun 22, 2018 at 07:02 UTC

    Well, what actually is there in that HASH reference? See Data::Dumper or similar module to see the details (if possible) underneath the reference.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1217162]
Approved by Corion
Front-paged by haukex
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (5)
As of 2024-04-24 10:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found