[Bioperl-l] How to get from gi/ref/gb to genomic coordinates ?

Barry Moore barry.moore at genetics.utah.edu
Wed Jan 31 23:27:30 UTC 2007


Rainer,

We use a perl library called CGL written by Mark Yandell and  
colleagues (which in turn uses Chris Mungal's BioChaos and  
Unflattener.pm referred to by Jason) for this type of task.  The  
basic pipeline is convert GenBank files to Chaos XML, then use CGL  
with those XML files to get a nice object oriented access to exons,  
transcripts, proteins, coordinates and more for of those genes.  I am  
currently using this with good success on most GenBank genomes  
(unfortunately I haven't been working with the fungal genomes, but it  
should work fine).  The Ensembl API provides similar functionality  
for Ensembl genomes - but not very many fungi there.

http://www.yandell-lab.org/cgl/
http://www.ensembl.org/info/software/core/core_tutorial.html

Feel free to contact Mark or myself  directly if you are interested  
in using CGL.

Barry

On Jan 31, 2007, at 2:09 PM, Rainer Machne wrote:

> Dear Bioperl list,
>
> hoping not be on the wrong email list, i would have a short question:
>
> Is there a standard way or are there nice (Bioperl) tools to come  
> from a
> gene id (gi) other ids (see below) to the genomic coordinates of the
> respective gene?
>
> We have Fasta files retrieved from NCBI protein Blast in fungal  
> genomes:
>
>> gi|46100068|gb|EAK85301.1| hypothetical protein UM04252.1 [Ustilago
> maydis 521]
> or
>> gi|50292953|ref|XP_448909.1| unnamed protein product [Candida  
>> glabrata]
>
> (we only have gi, ref and gb in my set).
>
> I retrieved all my fasta files from whole fungal genomes with  
> available
> protein sequences at
> http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi?organism=fungi
>
> As I only searched whole finished genomes (not shotgun), I thought it
> would then be easy to get the genomic coordinates and retrieve  
> upstream
> sequences, but we have failed so far to find a consistent way to do  
> this
> automatically. Many of the gi entries refer to mRNAs or partial mRNAs
> and the way to the coordinates seems to differ for each case.
>
> Any suggestions would be appreciated.
>
> with kind regards,
> Rainer Machne
>
> University of Vienna
> Department for Theoretical Chemistry
> Theoretical Biochemistry Group
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list