[EMBOSS] Common Sample Data Collection, was: SCF files (Staden)

Wed Nov 30 10:41:37 UTC 2011

On Wed, Nov 30, 2011 at 10:30 AM, Peter Rice <pmr at ebi.ac.uk> wrote:
> On 11/29/2011 07:09 PM, Fields, Christopher J wrote:
>>
>> On Nov 29, 2011, at 12:35 PM, Peter Cock wrote:
>>>
>>> Doesn't BioPerl just use the Staden libraries for this internally?
>>
>> Yes, and it uses an old version as well (via bioperl-ext).  Much of this
>> effort was to go into the biolib initiative for creating cross-lang bindings
>> using swig, but that seems to be silent at the moment.  I'm surprised Python
>> doesn't have io_lib bindings.
>
> BioLib is just swig wrappers around the existing Bio* interfaces and
> code, so it will not help in this case if the projects are too divergent.
>
> Could we set up a Bio* collection of data formats with examples and
> note which projects can handle each one?
>
> We do not need any one project to cover everything - we can reasonably
> expect users to use some other project to interconvert formats if there are
> gaps.
>
> regards,
>
> Peter Rice
> EMBOSS Team

Good plan. I suggest we make a repository on github, perhaps
bio-data or something like that, under the recently created OBF
account, https://github.com/OBF

Peter R - do you have a GitHub account yet? If so we (me,
Chris Field, etc) can give you access to the OBF org account.

For licensing, where we are free to choose the licence, I would
like to go with something as liberal as possible to allow the
files to be used by any OSS project (or closed source project),
(e.g. Public Domain, CC0, MIT/BSD) rather than something
more principled but restricted like CC-BY or CC-BY-ND.

However, as we know from recent Debian packaging
discussion about test cases taken from UniProt, licensing
and copyright of samples from a database is complicated.
Here we must at least keep careful records about where
data came from.

Peter