[Biojava-l] Ensembl gene parsing

Ewan Birney birney at ebi.ac.uk
Wed Jan 29 09:58:18 EST 2003


On Wed, 29 Jan 2003, Stein Aerts wrote:

> Hi Ewan,
> I know of Mart (and I like it) but it is not suited for automated 
> sequence retrieval using gene_stable_id's (a SOAP web service for the 
> export data function would be nice). Anyway, the Mart output would have 
> currently the same faults I guess. Do you reckon that the fixing of the 
> Ensembl bugs is a short term matter?


(a) we know that people want to script against Mart and are working 
towards this - Arek might be able to fill you in (over to Arek)


(b) Scripting against the core is best done probably with a specific
ensembl script (perl) that doesn't bounce through genbank format - tell us
what you want and I suspect Arne or Graham can whip up a (perl) script
quickly


(c) If you don't like Perl ( ... this is the biojava mailing list...) then 
there is a pretty complete and stable Java binding to Ensembl - it doesn't 
use BioJava - it is more just a vanilla Java binding to Ensembl. Craig 
melsopp is the lead on that. The web page is

http://www.ensembl.org/java/


(d) Almost certainly, parsing GenBank/EMBL format is one of the worst ways 
to get information out of Ensembl - there is lots of stuff inside Ensembl 
which we can't dump due to format and/or space issues; we don't consider 
it to be a primary route of information...


... it doesn't change the fact there are bugs on our side ;) and we will 
fix those.


k



More information about the Biojava-l mailing list