[Open-bio-l] Common Sample Data Collection, was: SCF files (Staden)

Peter Rice pmr at ebi.ac.uk
Wed Nov 30 11:38:30 UTC 2011


On 11/30/2011 11:32 AM, Pjotr Prins wrote:

> Git is not very good for storing large data files, which we would want
> to fetch partially. My suggestion would be to have a plain old file
> repo, e.g. on S3, which can be mirrored by others.

We had issues with large files in the EMBOSS release, and make those 
available via rsync to add to the developers CVS checkout. They include 
the NCBI taxonomy source and index files and the ontology source and 
index files.

The next EMBOSS release will include http and ftp URLs as valid inputs 
for any data type, so EMBOSS could use remote files for format tests. I' 
look into how other repositories could be added.

I had to add some extra qualifiers to allow queries and offsets to be 
specified, and rewrote the query language parsing to merge very similar 
code segments.

regards,

Peter Rice
EMBOSS Team



More information about the Open-Bio-l mailing list