[EMBOSS] Files included in EMBOSS but licensed ...

Peter Rice pmr at ebi.ac.uk
Sat Jul 30 08:58:07 UTC 2011

Quoted in full for the benefit of the debian-med list who missed the 
original posting

On 29/07/2011 21:35, Adam Sjøgren wrote:
> On Fri, 29 Jul 2011 09:39:46 +0100, Peter wrote:
>> It might make things clearer if someone from Debian could explain:
> (I am not from Debian, but here is my take on it anyway:)
>> (a) why a Creative Commons licence is an issue for you
> One of the fundamental software freedoms is the freedom to change the
> software¹.
> The Debian Free Software Guidelines' definition of free software
> includes this freedom².
> So the "No Derivatives" variants of the Creative Commons licenses aren't
> free by the DFSG definition.
> (The GNU Free Documentation License on documents with invariant sections
> is considered non-free by DFSG-standards as well, even if the invariant
> sections are things that nobody would want to change.)
> When a project of volunteers packages 29000+ thousand packages, I think
> making a judgement call on whether it is okay that the license of a
> couple of files does not live up to the guidelines is neigh impossible.

> The answer to "Why would you want to?" is, because you might need to.
> It is more obvious with programs and code than it is with database
> entries, granted - but I guess the equivalent problem would be that the
> licensor didn't want to fix a problem in such a database, and that
> problem made the programs using it malfunction. It would be a pain if
> you weren't allowed to fix the problem and distribute the fixed data
> yourself, say, if "upstream" didn't want to include the fix for some
> reason or another; maybe they happened to turn sour on the world/you -
> stranger things have happened.
> So, nobody is probably ever going to exercise that freedom in this
> specific case, I think, but ignoring some of the freedoms in special
> cases is infeasible for a project such as Debian.
> This is just me trying to explain how I understand it, so take it with a
> grain of salt, and swing by debian-legal³ for the experts.

A specific example might help. About 5 years ago a release of the 
UniProt database (as plain text files) broke the Wisconsin (GCG) 
sequence analysis package. They introduced extremely long lines in a 
data file that everyone assumed was only maximum 80 characters.

As GCG was closed source, the fix required a change to the UniProt files 
to either wrap or truncate the 'offending' records.

The fix was not to distribute a change to the data of course, but to 
write and distribute a simple perl script that wrapped the long records.

That was not a licensing issue - the content stays the same, the format 
is changed, no changed data is distributed. But it does illustrate that 
the database licensing does not prevent 'fixing' a database.

>> (b) why you appear to consider a copy of a whole or part of a public
>> biological database as part of an "operating system"
> They are part of a package which is included in the Debian GNU/Linux
> free operating system.

I expect there are many problems that arise if data ... and 
documentation ... are considered to be software. For EMBOSS we didn't 
officially specify a license for the documentation but other packages 
probably do. It still worries me that some of our documentation files 
officially include GPL licensed (EMBOSS) source code but I did not like 
any of the alternative documentation licenses.

> (I personally think it would make sense to change to a Creative Commons
> license that allows derivative works - Uniprot and others are going to
> be the canonical source for the data anyway, so nothing will be lost by
> them by doing that, as far as I can see.)

Unlikely. The no-derivatives version is specifically there to prevent 
derivatives - for example Debian distributing a modified UniProt without 

The ontologies are similar, but do allow for the use case of importing 
terms from one ontology into another if the ontology name is changed 
(and preferably if cross-references to the original are provided). 
Again, the need is to protect the integrity of the original ontology 
content so references to a GO term or a UniProt entry are clearly defined.

This is essential for many of the public bioinformatics databases. Data 
and software are not the same in this context. I am curious whether 
documentation licensing raises any issues.

Just my 2c worth

Peter Rice

More information about the EMBOSS mailing list