[EMBOSS] About bulky taxonomy and gene ontology data in the EMBOSS package.
Charles Plessy
charles-listes-emboss at plessy.org
Thu Jul 19 04:17:53 UTC 2012
Dear EMBOSS developers,
today I received the following bug report about the quantity of data shipped in
the Debian package for EMBOSS.
> emboss-data recently grew from a slim 5 megabytes to a massive 305.
> Closer inspection reveals the primary culprits to be large taxonomy
> and gene ontology databases:
>
> 63927095 ./usr/share/EMBOSS/data/TAXONOMY/names.dmp
> 58897689 ./usr/share/EMBOSS/index/taxon.xtax
> 53221137 ./usr/share/EMBOSS/data/TAXONOMY/nodes.dmp
> 28885180 ./usr/share/EMBOSS/index/taxon.xid
> 23876286 ./usr/share/EMBOSS/index/taxon.xup
> 20110641 ./usr/share/EMBOSS/data/OBO/gene_ontology.1_2.obo
> 13816644 ./usr/share/EMBOSS/index/go.xde
> 8964472 ./usr/share/EMBOSS/index/taxon.xrnk
> 8963308 ./usr/share/EMBOSS/index/taxon.xgc
> 6047464 ./usr/share/EMBOSS/index/go.xnm
> 5642511 ./usr/share/EMBOSS/index/taxon.xmgc
> 2504535 ./usr/share/EMBOSS/index/go.xac
> 2220728 ./usr/share/EMBOSS/index/go.xis
> 1292180 ./usr/share/EMBOSS/index/go.xid
> [...]
> 437512 ./usr/share/EMBOSS/index/go.xns
http://bugs.debian.org/682042
In Debian, one solution would be to transfer this data in a separate optional
package. But before doing so, I would like to ask you if this data really
oughts to be distributed with EMBOSS ? After all, for many other databases,
there are scripts to download and index the data after installation. Will
EMBOSS 6.5 ship the taxonomy and gene ontology databases as well ?
Have a nice day,
--
Charles Plessy
Debian Med packaging team,
http://www.debian.org/devel/debian-med
Tsurumi, Kanagawa, Japan
More information about the EMBOSS
mailing list