[EMBOSS] About bulky taxonomy and gene ontology data in the EMBOSS package.

Peter Rice ricepeterm at yahoo.co.uk
Thu Jul 19 07:07:21 UTC 2012


On 19/07/2012 05:17, Charles Plessy wrote:
> Dear EMBOSS developers,
>
> today I received the following bug report about the quantity of data shipped in
> the Debian package for EMBOSS.
>
>> emboss-data recently grew from a slim 5 megabytes to a massive 305.
>> Closer inspection reveals the primary culprits to be large taxonomy
>> and gene ontology databases:
>
> In Debian, one solution would be to transfer this data in a separate optional
> package.  But before doing so, I would like to ask you if this data really
> oughts to be distributed with EMBOSS ?  After all, for many other databases,
> there are scripts to download and index the data after installation.  Will
> EMBOSS 6.5 ship the taxonomy and gene ontology databases as well ?

They are included in the release which appeared on 15th July 
(announcement in preparation).

For developers the data is updated by rsync ... we could provide scripts 
to upload and index the data at the end of installation though I found 
in preparing the release that two ontologies had moved in the last year 
so that is error-prone.

Some EMBOSS applications assume these databases are installed, 
particularly EDAM and the NCBI taxonomy. EDAM is used for all metadata, 
the taxonomy for organism searches in data retrieval. The Gene Ontology 
is included in analysis of GO terms in metadata.

So if they are in an optional package ... some things will not work if 
it is not installed. EDAM I would say is essential.

regards,

Peter Rice
EMBOSS Team




More information about the EMBOSS mailing list