[Bioperl-l] How to get from gi/ref/gb to genomic coordinates ?
Cui, Wenwu (NIH/NLM/NCBI) [C]
cuiw at ncbi.nlm.nih.gov
Thu Feb 1 14:47:38 UTC 2007
This is a simple test from gene ID 3632373 (protein is 46100068) to
contig coordinates:
perl -MLWP::Simple -e 'map {print $_, "\n" if
/<(Gene-source_src.*?>)(.*)?<$1/} (split "\n",
get(q{http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&i
d=3632373&retmode=xml}))'
You need to translate protein id to gene id though.
If the genome is available at Map Viewer, try (the contig name is
NW_101115 from last step)
http://www.ncbi.nlm.nih.gov/mapview/maps.cgi?taxid=5270&gnl=NW_101115&MA
PS=genes&cmd=txt
Wenwu Cui, PhD
-----Original Message-----
From: Rainer Machne [mailto:raim at tbi.univie.ac.at]
Sent: Wednesday, January 31, 2007 4:10 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] How to get from gi/ref/gb to genomic coordinates ?
Dear Bioperl list,
hoping not be on the wrong email list, i would have a short question:
Is there a standard way or are there nice (Bioperl) tools to come from a
gene id (gi) other ids (see below) to the genomic coordinates of the
respective gene?
We have Fasta files retrieved from NCBI protein Blast in fungal genomes:
>gi|46100068|gb|EAK85301.1| hypothetical protein UM04252.1 [Ustilago
maydis 521]
or
>gi|50292953|ref|XP_448909.1| unnamed protein product [Candida
glabrata]
(we only have gi, ref and gb in my set).
I retrieved all my fasta files from whole fungal genomes with available
protein sequences at
http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi?organism=fungi
As I only searched whole finished genomes (not shotgun), I thought it
would then be easy to get the genomic coordinates and retrieve upstream
sequences, but we have failed so far to find a consistent way to do this
automatically. Many of the gi entries refer to mRNAs or partial mRNAs
and the way to the coordinates seems to differ for each case.
Any suggestions would be appreciated.
with kind regards,
Rainer Machne
University of Vienna
Department for Theoretical Chemistry
Theoretical Biochemistry Group
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list