[Bioperl-l] How to get from gi/ref/gb to genomic coordinates ?
Rainer Machne
raim at tbi.univie.ac.at
Thu Feb 1 12:54:21 UTC 2007
Barry and Jason,
thanks for your quick and very helpful replies.
I guess we should have done (or repeat) our blast search at
http://fungal.genome.duke.edu/
to get better mapping from proteins to genomes ?
As I retrieved all my proteins via whole genome blasts we should find
(most of) them in the genbank files ... a good opportunity for me to
learn some Bioperl and the other packages you mentioned in case we want
to do more complex analysis later :-)
Thank you very much!
Rainer
Barry Moore wrote:
> Rainer,
>
> We use a perl library called CGL written by Mark Yandell and colleagues
> (which in turn uses Chris Mungal's BioChaos and Unflattener.pm referred
> to by Jason) for this type of task. The basic pipeline is convert
> GenBank files to Chaos XML, then use CGL with those XML files to get a
> nice object oriented access to exons, transcripts, proteins,
> coordinates and more for of those genes. I am currently using this
> with good success on most GenBank genomes (unfortunately I haven't been
> working with the fungal genomes, but it should work fine). The Ensembl
> API provides similar functionality for Ensembl genomes - but not very
> many fungi there.
>
> http://www.yandell-lab.org/cgl/
> http://www.ensembl.org/info/software/core/core_tutorial.html
>
> Feel free to contact Mark or myself directly if you are interested in
> using CGL.
>
> Barry
>
> On Jan 31, 2007, at 2:09 PM, Rainer Machne wrote:
>
>> Dear Bioperl list,
>>
>> hoping not be on the wrong email list, i would have a short question:
>>
>> Is there a standard way or are there nice (Bioperl) tools to come from a
>> gene id (gi) other ids (see below) to the genomic coordinates of the
>> respective gene?
>>
>> We have Fasta files retrieved from NCBI protein Blast in fungal genomes:
>>
>>> gi|46100068|gb|EAK85301.1| hypothetical protein UM04252.1 [Ustilago
>>
>> maydis 521]
>> or
>>
>>> gi|50292953|ref|XP_448909.1| unnamed protein product [Candida glabrata]
>>
>>
>> (we only have gi, ref and gb in my set).
>>
>> I retrieved all my fasta files from whole fungal genomes with available
>> protein sequences at
>> http://www.ncbi.nlm.nih.gov/sutils/genom_table.cgi?organism=fungi
>>
>> As I only searched whole finished genomes (not shotgun), I thought it
>> would then be easy to get the genomic coordinates and retrieve upstream
>> sequences, but we have failed so far to find a consistent way to do this
>> automatically. Many of the gi entries refer to mRNAs or partial mRNAs
>> and the way to the coordinates seems to differ for each case.
>>
>> Any suggestions would be appreciated.
>>
>> with kind regards,
>> Rainer Machne
>>
>> University of Vienna
>> Department for Theoretical Chemistry
>> Theoretical Biochemistry Group
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
More information about the Bioperl-l
mailing list