[Biopython-dev] ENSEMBL 3' UTR

Michael Hoffman hoffman at ebi.ac.uk
Wed Feb 16 06:18:13 EST 2005


On Wed, 16 Feb 2005, Vineeth S wrote:

> I have looked fro some way to get the co-ordinates for
> the UTR regions for ENSEMBL genes quite unsucessfully.
> The Genbank dumps from ENSEMBL arent as
> straightforward as the NCBI Genbank dumps, and to get
> ENSEMBL 3' regions as fasta from the Mart one has to
> click through each chromosome.

You should be able to do it with MartShell. But this is not working
for me right now for some reason. Might be my own user error.

You can use Ensembl/Jython (part of Ensj) until MartShell is fixed:

$ jython
Jython 2.1 on java1.4.2 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> import ensembl
>>> gene = ensembl.fetch("ENSG00000139618")
>>> transcript = gene.transcripts[0] # this gene only has one transcript
>>> transcript.translation.fivePrimeUTR
[chromosome_NCBI35:13:31787617-31787804:1, chromosome_NCBI35:13:31788559-31788597:1]
>>> transcript.translation.threePrimeUTR
[chromosome_NCBI35:13:31870908-31871347:1, chromosome_NCBI35:13:31871746-31871805:1]

Note that the UTRs here are lists because there is an intron in the
middle of each one. You can iterate through all the genes in human
using ensembl.human.all_genes(). And of course that method exists for
other species as well.

> Is there anyway to get around this ? Or would it be
> useful to write a module to do tblastn of the protein
> against the corresponding gene and then get the 3' UTR
> region from that ?

Doing a computationally-expensive alignment does not seem like a good
way of extracting information that already exists. Especially if you
have to do a lot of them.

> PS : Someday I wish the twain will meet and make life for anybody
> doing sequence analysis easier.

You wish what would meet?

Also, unless you are further discussing a new Biopython module, this
discussion is probably not on-topic for this mailing list
anymore. Feel free to send Ensembl/Jython queries directly to me and
other queries to helpdesk at ensembl.org.
-- 
Michael Hoffman <hoffman at ebi.ac.uk>
European Bioinformatics Institute



More information about the Biopython-dev mailing list