[Open-bio-l] Common Sample Data Collection, was: SCF files (Staden)

Peter Cock p.j.a.cock at googlemail.com
Wed Nov 30 11:14:44 UTC 2011


On Wed, Nov 30, 2011 at 11:04 AM, Peter Rice <pmr at ebi.ac.uk> wrote:
> On 11/30/2011 10:41 AM, Peter Cock wrote:
>>
>> On Wed, Nov 30, 2011 at 10:30 AM, Peter Rice<pmr at ebi.ac.uk>  wrote:
>>>
>>> BioLib is just swig wrappers around the existing Bio* interfaces and
>>> code, so it will not help in this case if the projects are too divergent.
>>>
>>> Could we set up a Bio* collection of data formats with examples and
>>> note which projects can handle each one?
>>>
>>> We do not need any one project to cover everything - we can reasonably
>>> expect users to use some other project to interconvert formats if there
>>> are
>>> gaps.
>>
>> Good plan. I suggest we make a repository on github, perhaps
>> bio-data or something like that, under the recently created OBF
>> account, https://github.com/OBF
>>
>> Peter R - do you have a GitHub account yet? If so we (me,
>> Chris Field, etc) can give you access to the OBF org account.
>
> No ... rather a pain that EMBOSS got used. I've register under some other
> name: EMBOSSTEAM and created an EMBOSS project under it.
>
> Looks like git import requires subversion for any automation.
> Preumably I need a fresh EMBOSS checkout from CVS and
> then commit everything by hand ... best done after the release
> 6.5.0 code freeze.

If you are talking about converting the EMBOSS CVS into git,
we can help with that having done it for Biopython. As part of
this it is possible to map CVS user names to github users.

I meant do you personally have a github account?

>> For licensing, where we are free to choose the licence, I would
>> like to go with something as liberal as possible to allow the
>> files to be used by any OSS project (or closed source project),
>> (e.g. Public Domain, CC0, MIT/BSD) rather than something
>> more principled but restricted like CC-BY or CC-BY-ND.
>
> Public domain would be my choice - we don't want to cause
> conflicts if any data is imported into other projects (e.g. as
> test cases)

Yes, public domain would be simplest where possible.

>> However, as we know from recent Debian packaging
>> discussion about test cases taken from UniProt, licensing
>> and copyright of samples from a database is complicated.
>> Here we must at least keep careful records about where
>> data came from.
>
> For that reason we probably should fake all the files for the
> public database formats.

Yes, a practical solution - although it has downsides of course.

Peter




More information about the Open-Bio-l mailing list