[Bioperl-l] Bioperl question - please help!!!

Stefan Kirov skirov at utk.edu
Thu Sep 29 10:16:25 EDT 2005


Olena,
I assume you have EntrezGene id (which is very different from GI, GI 
being unique sequence identifier). You need to fetch first the RefSeq GI 
or accession based in the EG id (which might be one-to-many 
relationship). There are number of ways to do that. First, you can use 
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2refseq.gz file to do that. You 
can also parse the appropriate file from 
ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/ASN_BINARY/ (I don't know what 
organism you are interested in), parse the data with 
Bio::SeqIO::entrezgene parser and extract the RefSeq identifiers. 
Finally, you can use a database such as EnsMART or GeneKeyDB. If this is 
a one time thing you can use web-based conversion tool, for example:
go to genereg.ornl.gov/gkdb, click on 'gateway to example scripts', 
select id converter and convert from LocusLink (being the older name for 
EG id) to refseq, copy the output, save it to a text file and
hen use Bio::DB::RefSeq to fetch those accession numbers. I am sure 
there other services like that, you can google for them I guess.
Hope this helps. Let me know if you have more questions
Stefan


Olena Morozova wrote:

>Hello,
>
>Could you please help me out with the following: I have GeneIDs and
>gene names of sequences from the Entrez Gene database, and I need to
>retrieve the corresponding coding sequences from RefSeq. I have tried
>using Bio::DB::GenBank to get the accession numbers, which I could
>then use to retrieve the sequences from RefSeq, but apparently Entrez
>Gene GIs are not the same as GenBank GIs and I am getting sequences of
>totally different genes by this method. I would really appreciate any
>comments or suggestions.
>
>Thank you very much
>Best Regards, Olena
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>  
>

-- 
Stefan Kirov, Ph.D.
University of Tennessee/Oak Ridge National Laboratory
5700 bldg, PO BOX 2008 MS6164
Oak Ridge TN 37831-6164
USA
tel +865 576 5120
fax +865-576-5332
e-mail: skirov at utk.edu
sao at ornl.gov

"And the wars go on with brainwashed pride
For the love of God and our human rights
And all these things are swept aside"



More information about the Bioperl-l mailing list