[EMBOSS] About bulky taxonomy and gene ontology data in the EMBOSS package.

Charles Plessy charles-listes-emboss at plessy.org
Thu Jul 19 04:17:53 UTC 2012


Dear EMBOSS developers,

today I received the following bug report about the quantity of data shipped in
the Debian package for EMBOSS.

> emboss-data recently grew from a slim 5 megabytes to a massive 305.
> Closer inspection reveals the primary culprits to be large taxonomy
> and gene ontology databases:
> 
>  63927095 ./usr/share/EMBOSS/data/TAXONOMY/names.dmp
>  58897689 ./usr/share/EMBOSS/index/taxon.xtax
>  53221137 ./usr/share/EMBOSS/data/TAXONOMY/nodes.dmp
>  28885180 ./usr/share/EMBOSS/index/taxon.xid
>  23876286 ./usr/share/EMBOSS/index/taxon.xup
>  20110641 ./usr/share/EMBOSS/data/OBO/gene_ontology.1_2.obo
>  13816644 ./usr/share/EMBOSS/index/go.xde
>   8964472 ./usr/share/EMBOSS/index/taxon.xrnk
>   8963308 ./usr/share/EMBOSS/index/taxon.xgc
>   6047464 ./usr/share/EMBOSS/index/go.xnm
>   5642511 ./usr/share/EMBOSS/index/taxon.xmgc
>   2504535 ./usr/share/EMBOSS/index/go.xac
>   2220728 ./usr/share/EMBOSS/index/go.xis
>   1292180 ./usr/share/EMBOSS/index/go.xid
> [...]
>    437512 ./usr/share/EMBOSS/index/go.xns

http://bugs.debian.org/682042

In Debian, one solution would be to transfer this data in a separate optional
package.  But before doing so, I would like to ask you if this data really
oughts to be distributed with EMBOSS ?  After all, for many other databases,
there are scripts to download and index the data after installation.  Will
EMBOSS 6.5 ship the taxonomy and gene ontology databases as well ?

Have a nice day,

-- 
Charles Plessy
Debian Med packaging team,
http://www.debian.org/devel/debian-med
Tsurumi, Kanagawa, Japan



More information about the EMBOSS mailing list