From katel@worldpath.net Fri Dec 1 07:00:43 2000 Date: Thu, 30 Nov 2000 23:00:43 -0800 From: Cayte katel@worldpath.net Subject: [BioPython] plans for next release
----- Original Message -----
From: "Brad Chapman" <chapmanb@arches.uga.edu>
To: "Cayte" <katel@worldpath.net>
Cc: "Jeffrey Chang" <jchang@smi.stanford.edu>; <biopython@biopython.org>
Sent: Wednesday, November 29, 2000 5:20 PM
Subject: Re: [BioPython] plans for next release


> Jeff:
> > > Thomas Sicheritz-Ponten is working on visualization with his xbbtools,
> > > using Tk instead of wx, though.
> > >
>
> Cayte:
> >    I'm not sure its supported on Windows.  The web description mentioned
> > Linux but didn't mention Windows.  It imports posix.
>
> Hi Cayte -- I just looked at this using my little bit of Windows/python
> knowledge. It looks like Thomas is importing posix/posixpath in all of
> the different modules in xbbtools, but never actually using them. I
> commented out all of the imports, installed Pmw, and xbbtools seems to
> run okay on the machine I was playing on (vanilla Windows 98).
>
> I'm not sure if there is a good reason to import the posix stuff
> (Thomas?) but maybe if not, it might be good to get rid of the imports
> so it'll run nicely on Windows as well.
>
  I ran it and it displayed a Fasta file.  But when I selected
Translations->6 Frames, it caused this trace.

C:\biopython-0.90d04\Scripts\xbbtools>python xbbtools.py lupine.nu
Exception in Tkinter callback
Traceback (most recent call last):
  File "c:\python20\lib\lib-tk\Tkinter.py", line 1287, in __call__
    return apply(self.func, args)
  File ".\xbb_widget.py", line 333, in gcframe
    np = NotePad()
  File ".\xbb_utils.py", line 24, in __init__
    self.tid = Pmw.ScrolledText(self)
AttributeError: ScrolledText

  The menu items barely show up, because in Windows, they're xdark green on
black, unless thry're highlighted.

  It would be efficient to coordinate different GUIs if its feasable?  On
the other hand, if I experiment, I don't want to mess up someone elses'
code.

                                             Cayte




From thomas@cbs.dtu.dk Sat Dec 2 12:26:57 2000 Date: 02 Dec 2000 13:26:57 +0100 From: Thomas Sicheritz-Ponten thomas@cbs.dtu.dk Subject: [BioPython] plans for next release
> 
> > Jeff:
> > > > Thomas Sicheritz-Ponten is working on visualization with his xbbtools,
> > > > using Tk instead of wx, though.
> > > >
> >
> > Cayte:
> > >    I'm not sure its supported on Windows.  The web description mentioned
> > > Linux but didn't mention Windows.  It imports posix.
> >
> > Hi Cayte -- I just looked at this using my little bit of Windows/python
> > knowledge. It looks like Thomas is importing posix/posixpath in all of
> > the different modules in xbbtools, but never actually using them. I
> > commented out all of the imports, installed Pmw, and xbbtools seems to
> > run okay on the machine I was playing on (vanilla Windows 98).
> >
> > I'm not sure if there is a good reason to import the posix stuff
> > (Thomas?) but maybe if not, it might be good to get rid of the imports
> > so it'll run nicely on Windows as well.
> >
>   I ran it and it displayed a Fasta file.  But when I selected
> Translations->6 Frames, it caused this trace.
> 
> C:\biopython-0.90d04\Scripts\xbbtools>python xbbtools.py lupine.nu
> Exception in Tkinter callback
> Traceback (most recent call last):
>   File "c:\python20\lib\lib-tk\Tkinter.py", line 1287, in __call__
>     return apply(self.func, args)
>   File ".\xbb_widget.py", line 333, in gcframe
>     np = NotePad()
>   File ".\xbb_utils.py", line 24, in __init__
>     self.tid = Pmw.ScrolledText(self)
> AttributeError: ScrolledText
> 
>   The menu items barely show up, because in Windows, they're xdark green on
> black, unless thry're highlighted.
> 
>   It would be efficient to coordinate different GUIs if its feasable?  On
> the other hand, if I experiment, I don't want to mess up someone elses'
> code.

Ok - I just came back from egypt. Of course there is no need at all for 
using posix.posixpath - thats still left from my novice days :-)
I fix that and try to remove all Pmw widgets (its easy to implement the
scrolled things in pure Tk)

I do not run python in windows, so could anybody send me correct color
configurations for windows ?
I have not worked on xbbtools for a time because of the question about how
to start one or more blast searches (threaded ?) without freezing the whole
Tk mainloop until the blast run is finished. To start a search (Blast,Fasta
Clustal etc) from within biopython is definitely a frequent task for which
there should exist a seperate module/class ... IMHO

c ya
-thomas

-- 
Sicheritz Ponten Thomas E.  CBS, Department of Biotechnology
thomas@biopython.org        The Technical University of Denmark
CBS:  +45 45 252489         Building 208, DK-2800 Lyngby
Fax   +45 45 931585         http://www.cbs.dtu.dk/thomas

	De Chelonian Mobile ... The Turtle Moves ...


From katel@worldpath.net Sun Dec 3 05:48:20 2000 Date: Sat, 2 Dec 2000 21:48:20 -0800 From: Cayte katel@worldpath.net Subject: [BioPython] plans for next release
----- Original Message -----
From: "Thomas Sicheritz-Ponten" <thomas@cbs.dtu.dk>
To: "Cayte" <katel@worldpath.net>
Cc: "Brad Chapman" <chapmanb@arches.uga.edu>; "Jeffrey Chang"
<jchang@smi.stanford.edu>; <biopython@biopython.org>
Sent: Saturday, December 02, 2000 4:26 AM
Subject: Re: [BioPython] plans for next release


> >
> > > Jeff:
> > > > > Thomas Sicheritz-Ponten is working on visualization with his
xbbtools,
> > > > > using Tk instead of wx, though.
> > > > >
> > >
> > > Cayte:
> > > >    I'm not sure its supported on Windows.  The web description
mentioned
> > > > Linux but didn't mention Windows.  It imports posix.
> > >
> > > Hi Cayte -- I just looked at this using my little bit of
Windows/python
> > > knowledge. It looks like Thomas is importing posix/posixpath in all of
> > > the different modules in xbbtools, but never actually using them. I
> > > commented out all of the imports, installed Pmw, and xbbtools seems to
> > > run okay on the machine I was playing on (vanilla Windows 98).
> > >
> > > I'm not sure if there is a good reason to import the posix stuff
> > > (Thomas?) but maybe if not, it might be good to get rid of the imports
> > > so it'll run nicely on Windows as well.
> > >
> >   I ran it and it displayed a Fasta file.  But when I selected
> > Translations->6 Frames, it caused this trace.
> >
> > C:\biopython-0.90d04\Scripts\xbbtools>python xbbtools.py lupine.nu
> > Exception in Tkinter callback
> > Traceback (most recent call last):
> >   File "c:\python20\lib\lib-tk\Tkinter.py", line 1287, in __call__
> >     return apply(self.func, args)
> >   File ".\xbb_widget.py", line 333, in gcframe
> >     np = NotePad()
> >   File ".\xbb_utils.py", line 24, in __init__
> >     self.tid = Pmw.ScrolledText(self)
> > AttributeError: ScrolledText
> >
> >   The menu items barely show up, because in Windows, they're xdark green
on
> > black, unless thry're highlighted.
> >
> >   It would be efficient to coordinate different GUIs if its feasable?
On
> > the other hand, if I experiment, I don't want to mess up someone elses'
> > code.
>
> Ok - I just came back from egypt. Of course there is no need at all for
> using posix.posixpath - thats still left from my novice days :-)
> I fix that and try to remove all Pmw widgets (its easy to implement the
> scrolled things in pure Tk)
>
> I do not run python in windows, so could anybody send me correct color
> configurations for windows ?
> I have not worked on xbbtools for a time because of the question about how
> to start one or more blast searches (threaded ?) without freezing the
whole
> Tk mainloop until the blast run is finished. To start a search
(Blast,Fasta
> Clustal etc) from within biopython is definitely a frequent task for which
> there should exist a seperate module/class ... IMHO
>
> c ya
> -thomas
>
> --
> Sicheritz Ponten Thomas E.  CBS, Department of Biotechnology
> thomas@biopython.org        The Technical University of Denmark
> CBS:  +45 45 252489         Building 208, DK-2800 Lyngby
> Fax   +45 45 931585         http://www.cbs.dtu.dk/thomas
>
> De Chelonian Mobile ... The Turtle Moves ...
>
> _______________________________________________
> BioPython mailing list  -  BioPython@biopython.org
> http://biopython.org/mailman/listinfo/biopython
>
>


From katel@worldpath.net Sun Dec 3 05:56:59 2000 Date: Sat, 2 Dec 2000 21:56:59 -0800 From: Cayte katel@worldpath.net Subject: [BioPython] plans for next release
> Ok - I just came back from egypt. Of course there is no need at all for
> using posix.posixpath - thats still left from my novice days :-)
> I fix that and try to remove all Pmw widgets (its easy to implement the
> scrolled things in pure Tk)
>

  Egypt sounds like a fascinating place to visit!  Late fall sounds like the
right time, too, with cooler weather.

  Have you ever considered wxPython?  I have a tool, SeqGui.py, in wxPython,
that's sort of like xbbtools.py  I find it easier to work with than Tkinter.
Back in May, we had a thread about Gui support. Its in the archives.

  I'd like the gui to eventually support color highlighting of features, for
example, regions of high consensus.

                                   Cayte


From thomas@cbs.dtu.dk Tue Dec 5 07:53:58 2000 Date: Tue, 5 Dec 2000 08:53:58 +0100 (CET) From: thomas@cbs.dtu.dk thomas@cbs.dtu.dk Subject: [BioPython] Re: plans for next release
> thomas wrote
> > Ok - I just came back from egypt. Of course there is no need at all for
> > using posix.posixpath - thats still left from my novice days :-)
> > I fix that and try to remove all Pmw widgets (its easy to implement the
> > scrolled things in pure Tk)
> >

> Cayte wrote
> 
>   Egypt sounds like a fascinating place to visit!  Late fall sounds like the
> right time, too, with cooler weather.

and nice snorkling too :-)
> 
>   Have you ever considered wxPython?  I have a tool, SeqGui.py, in wxPython,
> that's sort of like xbbtools.py  I find it easier to work with than Tkinter.
> Back in May, we had a thread about Gui support. Its in the archives.

#################
# It seems that my reply didn't make it through sendmail :-( - I try to
# reconstruct ...
#########

The main reasons for my sticking to Tkinter are the fact that I have used
Tcl/Tk a lot before I discovered python - I have tons of Tk snippets from
my previous bioinformatic work (Biowish, GRS, XBbtools, CapDB etc.) which
is very easy to convert into shorter, cleaner and more efficient python Tk
code. Maybe the biggest advantage in using Tkinter is the powerful Tk
Canvas, as far as I know neither wxPython or Gtk python have anything close
to the canvas widget.

> 
>   I'd like the gui to eventually support color highlighting of features, for
> example, regions of high consensus.
> 

I don't know how this works in wxPython, but in Tkinter it is already
there from the beginning. Every line, rectangle etc. you draw in the
canvas is an unique object and gets an id. You can very easy bind any event
(e.g. MouseOver, DoubleClickButton1 etc.) to any function. To highlight
different genes or sequence regions is just to group the according id's and
bind a color-change on a MouseOver event.

e.g. my recently accepted paper about phylogenomics with python (NAR nr2
2001) deals with the interactive display of all genes, phylogenetic
trees, blast results for a microbial genome (between 1000 and 5000 times
3). 
I have no fancy webpage yet but you can check a screenshot of the
phylome of the Bacteria Thermotoga maritima
at http://www.cbs.dtu.dk/thomas/pyphy/pyphy.png
(Phylome = set of all phylogenetic trees for a genome. 
 color coding for the kingdom of the closest neighbor in the phylogenetic
 tree: blue = Bacteria, yellow = Archaea, red = Eukarya)

Here the phylome map is an interactive display of all phylogenetic trees
and genes (colored lines in the circle), where each line/gene is sensitive
to mouse movement. A MouseOver event displays gene information in the top
Entry, Button1Click shows the phylogenetic tree, Button3 shows a gene
specific popupmenu for blastresults, alignments etc.
Each gene can be a member of a metabolic pathway, where selecting a pathway
in the right listbox changes the width and the arrow shape of each gene
associated (canvas tag) with the pathway.

The advantage here is zooming, resizing, moving and event grabbing is part
of the canvas widget so we only need to redraw single objects.

I have never worked with wxPython - what is exactly the strength of
wxWindows ? I guess it is faster than Tkinter, are there any special
features not found in the rest of the GUI family ?


c ya
-thomas

Sicheritz Ponten Thomas E.  CBS, Department of Biotechnology
thomas@biopython.org        The Technical University of Denmark
CBS:  +45 45 252485         Building 208, DK-2800 Lyngby
Fax   +45 45 931585         http://www.cbs.dtu.dk/thomas/index.html

        De Chelonian Mobile ... The Turtle Moves ...

From katel@worldpath.net Wed Dec 6 06:40:50 2000 Date: Tue, 5 Dec 2000 22:40:50 -0800 From: Cayte katel@worldpath.net Subject: [BioPython] Re: [Biopython-dev] Re: plans for next release
> and nice snorkling too :-)
> >

   Do you have pictures of Egypt to post on the web?

> >   Have you ever considered wxPython?  I have a tool, SeqGui.py, in
wxPython,
> > that's sort of like xbbtools.py  I find it easier to work with than
Tkinter.
> > Back in May, we had a thread about Gui support. Its in the archives.
>
>
> The main reasons for my sticking to Tkinter are the fact that I have used
> Tcl/Tk a lot before I discovered python - I have tons of Tk snippets from
> my previous bioinformatic work (Biowish, GRS, XBbtools, CapDB etc.) which
> is very easy to convert into shorter, cleaner and more efficient python Tk
> code. Maybe the biggest advantage in using Tkinter is the powerful Tk
> Canvas, as far as I know neither wxPython or Gtk python have anything
close
> to the canvas widget.
>

  I think the Windows version is a wrapper around the Windows Gui and that
wxPython attempts to provide equivalent functionality in Linux.
> >
> >   I'd like the gui to eventually support color highlighting of features,
for
> > example, regions of high consensus.
> >
>
> I don't know how this works in wxPython, but in Tkinter it is already
> there from the beginning. Every line, rectangle etc. you draw in the
> canvas is an unique object and gets an id. You can very easy bind any
event
> (e.g. MouseOver, DoubleClickButton1 etc.) to any function. To highlight
> different genes or sequence regions is just to group the according id's
and
> bind a color-change on a MouseOver event.

> e.g. my recently accepted paper about phylogenomics with python (NAR nr2
> 2001) deals with the interactive display of all genes, phylogenetic
> trees, blast results for a microbial genome (between 1000 and 5000 times
> 3).
> I have no fancy webpage yet but you can check a screenshot of the
> phylome of the Bacteria Thermotoga maritima
> at http://www.cbs.dtu.dk/thomas/pyphy/pyphy.png
> (Phylome = set of all phylogenetic trees for a genome.
>  color coding for the kingdom of the closest neighbor in the phylogenetic
>  tree: blue = Bacteria, yellow = Archaea, red = Eukarya)
>
> Here the phylome map is an interactive display of all phylogenetic trees
> and genes (colored lines in the circle), where each line/gene is sensitive
> to mouse movement. A MouseOver event displays gene information in the top
> Entry, Button1Click shows the phylogenetic tree, Button3 shows a gene
> specific popupmenu for blastresults, alignments etc.
> Each gene can be a member of a metabolic pathway, where selecting a
pathway
> in the right listbox changes the width and the arrow shape of each gene
> associated (canvas tag) with the pathway.
>
> The advantage here is zooming, resizing, moving and event grabbing is part
> of the canvas widget so we only need to redraw single objects.
>
Does it support colorization with enough flexibility, to support research on
the fly, as in this scenario?

USER STORY:
   Ed Enzyme is doing some detective work on  an alignment.  First he
highlights the start and stop codons in red and green.  Then Ed zooms in on
an interesting sequence.  He first highights the hydrophilic regions in
magenta.  Then Ed backtracks and highlights the acidic regions.
>

> I have never worked with wxPython - what is exactly the strength of
> wxWindows ? I guess it is faster than Tkinter, are there any special
> features not found in the rest of the GUI family ?
>
>

  I found it was easier to work with.  With wxPython I could write more code
in the same time and fewer problems, like panels that don't quite line up.


Cayte


From birney@ebi.ac.uk Sat Dec 9 19:03:10 2000 Date: Sat, 9 Dec 2000 19:03:10 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [BioPython] An open letter to bioinformatcis researchers
Dear fellow bioinformatics developers:

By now you have probably heard that Celera Genomics has submitted
their human genome paper to the journal Science. Science and Celera
have agreed to special terms for the release of the human genome
sequence data.  It will be made available through the Celera website,
and will not be submitted to the international DNA database consortium
(GenBank, EMBL and DDBJ). Science's statement regarding the agreement
is at:
http://www.sciencemag.org/feature/data/announcement/genomesequenceplan.shl

All major journals, including Science, have a policy of deposition of
sequence data with the "appropriate data bank". The accepted community
standard is submission to GenBank/EMBL/DDBJ. The reason for this
deposition is to make the results of the work openly available for
future research. This principle was specifically mentioned in the
Clinton/Blair statement on human genome sequencing -
 http://www.usinfo.state.gov/topical/global/biotech/00031401.htm
- who strongly upheld the view that "unencumbered access" to genome
data was critical.

The terms of the Celera/Science agreement will give us access to the
genome sequence, but not unencumbered access.  Celera is suggesting
publishing their data under a MTA (Material Transfer Agreement) which
would prevent large scale downloads and incorporation of this data
into GenBank/EMBL/DDBJ. In order to download the data, you and your
institution will have to sign a contract guaranteeing that you will
not "redistribute" the Celera data.

Science believes that the deal is an adequate compromise because it
provides us the right to download the data and publish our results.
We believe Science is thinking in terms of single gene biology, not
large scale bioinformatics. It is probably not hard for you to imagine
scenarios in bioinformatics in which "publication" and
"redistribution" are virtually the same thing; we cannot imagine
Celera allowing us to incorporate data into Pfam, for example,
nor into Ensembl.

We are asking for your support in writing to Science to politely
insist that genome sequence papers should be accompanied by
unencumbered deposition to GenBank/EMBL/DDBJ. Please note that we have
no issue with Celera either keeping this data unpublished for
commercial reasons, nor with them combining their data with freely
available data from the public genome projects. We would defend their
right to do either. Our view is simply that the genome community has
established a clear principle that published genome data must be
deposited in the international databases, that bioinformatics is
fueled by this principle, and that Science therefore threatens to set
a precedent that undermines our research.

We encourage you to express your views on this matter to Donald
Kennedy (kennedyd@kennedyd.pobox.stanford.edu), the Editor-in-Chief of
Science, and/or to Barbara Jasny (bjasny@aaas.org), the managing
editor in charge of genomics papers at Science.


Here is a Q/A about some points.   

* Why does this matter?

A classic example of how our field began to have an impact on
molecular biology was Russ Doolittle's discovery of a significant
sequence similarity between a viral oncogene and a cellular growth
factor receptor. Russ could not have found that result if he did not
have an aggregate database of previously published sequences. We have
come a long way from Russ and his son typing data into the NEWAT
protein sequence database by hand.

Throughout the 80's the international database community fought hard
to insist that DNA sequence data be deposited into the public domain
databases. Journals now generally require deposition as a condition of
accepting a paper. The forming of these databases and the
international agreements on data sharing between the European,
American and Japanase databases fostered the rapid development of
bioinformatics research. We now all take for granted the fact that
large DNA databases are accessible from a single point of contact, and
the identifiers are coordinated worldwide.

Bioinformatics research relies on open data with minimal legal
encumberances submitted to public databases. Without these databases
there is no real substrate for bioinformatics research.


* What would happen if this precedent was set?

There are a number of consequences if Science set a precedent that
allowed people to publish DNA data under a variety of MTAs.

- One would not be able to form a single DNA database on which to
  do bioinformatics research, and the derivative databases (Swissprot,
  PIR, Pfam, PROSITE, etc.) would not be legal.

- Bench biologists would have to visit a number of websites and
  possibly enter into a number of different contracts for access to DNA
  data. Unexpected informative homologies could become prohibitively
  difficult to find.

- You may need to get a legal review before you can publish
  the results of an analysis, if your analysis is large-scale and
  detailed enough that it could be reasonably interpreted as a 
  "redistribution" of the primary sequence data. You could
  be sued for breach of contract for a Web Supplement page
  that discloses extensive sequence data supporting your results.

- Scientific openness will be undermined. Efforts to engage the
  community in cooperative annotation of large genomes, for instance,
  would be blocked -- we can't usefully annotate a genome we can't freely
  redistribute.


* Celera paid for it. Can't they set their own access terms?

Absolutely. We have no issue with Celera's commercial data gathering,
and their right to set their own access terms to their data.  We do
feel, though, that scientific publications carry a certain ethical
responsibility. The purpose of a paper is to enable the community to
efficiently build on your work. There is always a tension between
disclosing your work to your competitors (this is not unique to
private companies!) and receiving scientific credit for your work via
publication.  This tension is natural, and maintaining a consistent
and acceptable balance is the reason that scientist and journals
establish community standards that dictate how data are required to be
disclosed. In this case, the clearly accepted community standard is
that DNA sequence data are deposited in Genbank/EMBL/DDBJ upon
publication.

We certainly do not blame Celera (much) for seeking a special deal
that lets them have their cake and eat it too -- they would
understandably like scientific credit for their terrific and important
work in human sequencing, and they would also like a profitable
business model.

We do blame Science for failing to take a strong stand in upholding
accepted scientific publication practices. We cannot accept that it is
necessary to sacrifice ethics for expediency.

* Science claims they are honouring their own policy. What gives?
  
Science now claims that all their policy really requires is that
archival data be available via a publicly accessible database.  We
think this is a conveniently revisionist view of their own policy,
which states (in Instructions to Authors):
  
"archival data sets (such as sequence and structural data) must be 
deposited with the appropriate data bank and the identifier code should be 
sent to Science for inclusion in the published manuscript (coordinates
must be released at the time of publication)"

Notice the use of the definitive article "THE appropiate data bank",
the notion of "deposition", and the additional rider that the
identifier code should be sent.

The spirit of this statement seems clear to us. Science's statement
anticipates that there is an appropriate, single, aggregrate community
database for each sort of archival data, whether DNA sequence, protein
structure coordinates, or something else. Sensibly, they don't name
every possible database for every possible archival data set.  They
expect that recognized community standards exist. In no way does
Science's statement seem consistent with the view that an individual
lab could start its own "public" DNA sequence database and send a
meaningless internal database identifier; to try to read it that way
is a post hoc rationalisation.


*  What can Science do? This is a done deal.

It's true that this is a done deal. Science and Celera have mutually
agreed to the general terms of data release. But there are two ways
that we can minimize the damage.

First, the details of the agreement are not set. In particular, there
is no definition of allowed "publication" versus prohibited
"redistribution". Science could specify definitions that did not
interfere with noncommercial uses of the data in bioinformatics,
allowing us redistribution rights if it made sense in the context of
our project (for example, a genome annotation project like Ensembl).
 
Second, and preferably, Science -- or even the peer reviewers -- can
uphold Science's own data access policy, and reject the paper.

Incidentally, they might also choose to enforce Science's policy on
prior publication, which states "...the main findings of a paper
should not have been reported in the mass media. Authors are, however,
permitted to present their data at open meetings but should not
overtly seek media attention." If I issued a press release upon
submission of a manuscript to Science, like Celera did, Science would
rightly fire it back to me without review.

* What can I do?

Agitate. Let Science know that you care. They consider this deal to be
a trial balloon for future genome papers. Even if we can't change the
deal with Celera, we can try to make sure it's a one-time-only deal
that's viewed as a Big Mistake. Write a letter to Science and tell
them how their actions would impact your research, both in the long
term and in the short term. Also, you can pass on this open letter to
other bioinformatics researchers you know.


Dr Sean Eddy, 
Alvin Goldfarb Professor of Computational Biology,
Howard Hughes Medical Institute, Washington University in St. Louis, USA

Dr Ewan Birney
Team Leader, Genomic Annotation
European Bioinformatics Institute, UK



From birney@ebi.ac.uk Sun Dec 10 13:44:31 2000 Date: Sun, 10 Dec 2000 13:44:31 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [BioPython] Update on Don Kennedy's address.
The address for don kennedy we gave out in our letter

kennedyd@kennedyd.pobox.stanford.edu

seems to bounce.

kennedyd@stanford.edu


seems not to bounce (hopefully because it is getting delivered)



-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>.  
-----------------------------------------------------------------


From jmh.neefs@pandora.be Sun Dec 10 21:25:13 2000 Date: Sun, 10 Dec 2000 22:25:13 +0100 From: Jean-Marc Neefs jmh.neefs@pandora.be Subject: [BioPython] RE: An open letter to bioinformatcis researchers
Dear Ewan and Sean,

I would like to add my 2 cents.

The biggest trouble with hiding the genome is keeping information away from 
the scientists, and slowing down research.

On the other hand, playing devil's advocate, this could be another call to 
the public effort for quicker finishing.


Anyway, Celera only has a small window of opportunity before the public 
data become available, and we all will have enough laboratory work to 
analyse and confirm the coming data deluge.

To end on a positive note: keep up the good work on Ensembl. I learn more 
and more about it each day and find it more and more useful.

I will contact Science. Kind Regards,

Jean-Marc Neefs
Senior Bioinformatics Scientist


-----Original Message-----
From:	Ewan Birney [SMTP:birney@ebi.ac.uk]
Sent:	Saturday, December 09, 2000 8:03 PM
To:	bioperl-l@bioperl.org; biojava-l@biojava.org; biopython@biopython.org; 
bioxml-dev@bioxml.org; ensembl-dev@ebi.ac.uk; apollo@ebi.ac.uk
Subject:	An open letter to bioinformatcis researchers



Dear fellow bioinformatics developers:

By now you have probably heard that Celera Genomics has submitted
their human genome paper to the journal Science. Science and Celera
have agreed to special terms for the release of the human genome
sequence data.  It will be made available through the Celera website,
and will not be submitted to the international DNA database consortium
(GenBank, EMBL and DDBJ). Science's statement regarding the agreement
is at:
http://www.sciencemag.org/feature/data/announcement/genomesequenceplan.shl

All major journals, including Science, have a policy of deposition of
sequence data with the "appropriate data bank". The accepted community
standard is submission to GenBank/EMBL/DDBJ. The reason for this
deposition is to make the results of the work openly available for
future research. This principle was specifically mentioned in the
Clinton/Blair statement on human genome sequencing -
 http://www.usinfo.state.gov/topical/global/biotech/00031401.htm
- who strongly upheld the view that "unencumbered access" to genome
data was critical.

The terms of the Celera/Science agreement will give us access to the
genome sequence, but not unencumbered access.  Celera is suggesting
publishing their data under a MTA (Material Transfer Agreement) which
would prevent large scale downloads and incorporation of this data
into GenBank/EMBL/DDBJ. In order to download the data, you and your
institution will have to sign a contract guaranteeing that you will
not "redistribute" the Celera data.

Science believes that the deal is an adequate compromise because it
provides us the right to download the data and publish our results.
We believe Science is thinking in terms of single gene biology, not
large scale bioinformatics. It is probably not hard for you to imagine
scenarios in bioinformatics in which "publication" and
"redistribution" are virtually the same thing; we cannot imagine
Celera allowing us to incorporate data into Pfam, for example,
nor into Ensembl.

We are asking for your support in writing to Science to politely
insist that genome sequence papers should be accompanied by
unencumbered deposition to GenBank/EMBL/DDBJ. Please note that we have
no issue with Celera either keeping this data unpublished for
commercial reasons, nor with them combining their data with freely
available data from the public genome projects. We would defend their
right to do either. Our view is simply that the genome community has
established a clear principle that published genome data must be
deposited in the international databases, that bioinformatics is
fueled by this principle, and that Science therefore threatens to set
a precedent that undermines our research.

We encourage you to express your views on this matter to Donald
Kennedy (kennedyd@kennedyd.pobox.stanford.edu), the Editor-in-Chief of
Science, and/or to Barbara Jasny (bjasny@aaas.org), the managing
editor in charge of genomics papers at Science.


Here is a Q/A about some points.

* Why does this matter?

A classic example of how our field began to have an impact on
molecular biology was Russ Doolittle's discovery of a significant
sequence similarity between a viral oncogene and a cellular growth
factor receptor. Russ could not have found that result if he did not
have an aggregate database of previously published sequences. We have
come a long way from Russ and his son typing data into the NEWAT
protein sequence database by hand.

Throughout the 80's the international database community fought hard
to insist that DNA sequence data be deposited into the public domain
databases. Journals now generally require deposition as a condition of
accepting a paper. The forming of these databases and the
international agreements on data sharing between the European,
American and Japanase databases fostered the rapid development of
bioinformatics research. We now all take for granted the fact that
large DNA databases are accessible from a single point of contact, and
the identifiers are coordinated worldwide.

Bioinformatics research relies on open data with minimal legal
encumberances submitted to public databases. Without these databases
there is no real substrate for bioinformatics research.


* What would happen if this precedent was set?

There are a number of consequences if Science set a precedent that
allowed people to publish DNA data under a variety of MTAs.

- One would not be able to form a single DNA database on which to
  do bioinformatics research, and the derivative databases (Swissprot,
  PIR, Pfam, PROSITE, etc.) would not be legal.

- Bench biologists would have to visit a number of websites and
  possibly enter into a number of different contracts for access to DNA
  data. Unexpected informative homologies could become prohibitively
  difficult to find.

- You may need to get a legal review before you can publish
  the results of an analysis, if your analysis is large-scale and
  detailed enough that it could be reasonably interpreted as a
  "redistribution" of the primary sequence data. You could
  be sued for breach of contract for a Web Supplement page
  that discloses extensive sequence data supporting your results.

- Scientific openness will be undermined. Efforts to engage the
  community in cooperative annotation of large genomes, for instance,
  would be blocked -- we can't usefully annotate a genome we can't freely
  redistribute.


* Celera paid for it. Can't they set their own access terms?

Absolutely. We have no issue with Celera's commercial data gathering,
and their right to set their own access terms to their data.  We do
feel, though, that scientific publications carry a certain ethical
responsibility. The purpose of a paper is to enable the community to
efficiently build on your work. There is always a tension between
disclosing your work to your competitors (this is not unique to
private companies!) and receiving scientific credit for your work via
publication.  This tension is natural, and maintaining a consistent
and acceptable balance is the reason that scientist and journals
establish community standards that dictate how data are required to be
disclosed. In this case, the clearly accepted community standard is
that DNA sequence data are deposited in Genbank/EMBL/DDBJ upon
publication.

We certainly do not blame Celera (much) for seeking a special deal
that lets them have their cake and eat it too -- they would
understandably like scientific credit for their terrific and important
work in human sequencing, and they would also like a profitable
business model.

We do blame Science for failing to take a strong stand in upholding
accepted scientific publication practices. We cannot accept that it is
necessary to sacrifice ethics for expediency.

* Science claims they are honouring their own policy. What gives?

Science now claims that all their policy really requires is that
archival data be available via a publicly accessible database.  We
think this is a conveniently revisionist view of their own policy,
which states (in Instructions to Authors):

"archival data sets (such as sequence and structural data) must be
deposited with the appropriate data bank and the identifier code should be
sent to Science for inclusion in the published manuscript (coordinates
must be released at the time of publication)"

Notice the use of the definitive article "THE appropiate data bank",
the notion of "deposition", and the additional rider that the
identifier code should be sent.

The spirit of this statement seems clear to us. Science's statement
anticipates that there is an appropriate, single, aggregrate community
database for each sort of archival data, whether DNA sequence, protein
structure coordinates, or something else. Sensibly, they don't name
every possible database for every possible archival data set.  They
expect that recognized community standards exist. In no way does
Science's statement seem consistent with the view that an individual
lab could start its own "public" DNA sequence database and send a
meaningless internal database identifier; to try to read it that way
is a post hoc rationalisation.


*  What can Science do? This is a done deal.

It's true that this is a done deal. Science and Celera have mutually
agreed to the general terms of data release. But there are two ways
that we can minimize the damage.

First, the details of the agreement are not set. In particular, there
is no definition of allowed "publication" versus prohibited
"redistribution". Science could specify definitions that did not
interfere with noncommercial uses of the data in bioinformatics,
allowing us redistribution rights if it made sense in the context of
our project (for example, a genome annotation project like Ensembl).

Second, and preferably, Science -- or even the peer reviewers -- can
uphold Science's own data access policy, and reject the paper.

Incidentally, they might also choose to enforce Science's policy on
prior publication, which states "...the main findings of a paper
should not have been reported in the mass media. Authors are, however,
permitted to present their data at open meetings but should not
overtly seek media attention." If I issued a press release upon
submission of a manuscript to Science, like Celera did, Science would
rightly fire it back to me without review.

* What can I do?

Agitate. Let Science know that you care. They consider this deal to be
a trial balloon for future genome papers. Even if we can't change the
deal with Celera, we can try to make sure it's a one-time-only deal
that's viewed as a Big Mistake. Write a letter to Science and tell
them how their actions would impact your research, both in the long
term and in the short term. Also, you can pass on this open letter to
other bioinformatics researchers you know.


Dr Sean Eddy,
Alvin Goldfarb Professor of Computational Biology,
Howard Hughes Medical Institute, Washington University in St. Louis, USA

Dr Ewan Birney
Team Leader, Genomic Annotation
European Bioinformatics Institute, UK



From auffray@infobiogen.fr Mon Dec 18 12:19:48 2000 Date: Mon, 18 Dec 2000 13:19:48 +0100 From: Charles Auffray auffray@infobiogen.fr Subject: [BioPython] Human genome sequence
Ewan,

Let me express my reaction to your mail.

The Science-Celera deal is not, on many accounts, a precedent. In 1995,
Nature published in its Genome Directory a paper by Adams et al. under an
agreement which already was breaching many of the commonly accepted rules
for publication and sequence data deposition. The TIGR group, led by Craig
Venter, had included data released in public databases by other groups
without releasing much of their own data, which at that time was only
accessible through an MTA (based on their relationship with Human Genome
Science). My Genexpress team was declined the possibility of publishing our
interpretation of our own data in the same issue of Nature. The silence of
the scientific community at that time was astounding (notwithstanding the
fact that by some irony, Genome Research published our paper the very same
day as the Genome Directory).

What is happening now with publication of the human genome sequence papers
seems to indicate that the lessons have not been taken from such past
events, and that people have short memories. The sort of work that will
lead to full description of genomes, transcriptomes and proteomes is the
result of the contributions of large number of individuals over several
decades. In an attempt to evaluate how many people should be cited as
co-authors of an overview paper describing the state of knowledge on the
human transcriptome, I ended up with a figure of 44,444 (including Venter
and his co-workers), that is in the same order of magnitude as the
estimated number of human genes. I believe it would be appropriate for
those seeking to publish milestones papers on the current knowledge of the
human genome, whether from the public or the private sector, to aknowledge
all those who led the ground for this work by citing them as co-authors. As
a first indication, there are 7846 papers registered in PubMed containing
"human genome" in their tittle or abstract. As many scientists know, and
despite all media coverage, the work is not yet completely finished, and
even it it was, it would only be the end of the beginning.

Such an action would have several advantages. First it would convey to the
public the idea that science is a collective as well as an individual
endeavour. Second, it would make clear that the sequence of the human
genome is common knowledge which can be shared by all to advance human
health, in line with the United Nations Declaration on the Human Genome and
Human Rights which was adopted unanimously by the 186 nations represented
in 1998 (http://www1.umn.edu/humanrts/instree/Udhrhg.htm). Part of the
process is, as you rightly point out, the development of large-scale
analyses using informatics which require enencumbered access of the primary
data in the established international electronic data repositories
(EMBL/NCBI/DDBJ). We also need to ensure that useful applications can be
developed with appropriate financial investment and reach the end user
through the healthcare system. In this respect, some level of balanced and
fair competition, which occurs both within the academic or industrial
sectors as well as between them, is desirable. The balance and fairness can
only be achieved if we all recognize the contributions of all and provide
the incentive for the required public and private investments needed. The
fuzziness of the intellectual property status of inventions based on
knowledge of the human genome sequence vs the human genome sequence itself
does not help.

In eight years, since I wrote a letter to Nature on this subject (DNA
sequences. Nature. 1992 355:292), it seems to me that we have witnessed
some progress in this regard, but a lot more effort is needed by the law
and policy makers to clarify the situation. It is their responsability to
enforce regulations disallowing attempts for monopolization, were it hrough
media coverage, as seems currently fashionable.  The sooner the better.
There is so much to do ahead of us.

Charles Auffray

>>Date: Mon, 11 Dec 2000 17:55:25 +0100
>>To: crbm@crbm.cnrs-mop.fr
>>From: Vincent Coulon <coulon@jones.igm.cnrs-mop.fr>
>>Subject: Celera/Science agreement: les scelerats!
>>X-MIME-Autoconverted: from quoted-printable to 8bit by
>xerxes.crbm.cnrs-mop.fr id RAA30399
>>
>>------- Forwarded Message
>>
>>From: Ewan Birney <birney@ebi.ac.uk>
>>To: bioperl-l@bioperl.org, biojava-l@biojava.org, biopython@biopython.org,
>>       bioxml-dev@bioxml.org, ensembl-dev@ebi.ac.uk, apollo@ebi.ac.uk
>>Subject: [Bioperl-l] An open letter to bioinformatcis researchers
>>Date: Sat, 9 Dec 2000 19:03:10 +0000 (GMT)
>>
>>
>>
>>Dear fellow bioinformatics developers:
>>
>>By now you have probably heard that Celera Genomics has submitted
>>their human genome paper to the journal Science. Science and Celera
>>have agreed to special terms for the release of the human genome
>>sequence data.  It will be made available through the Celera website,
>>and will not be submitted to the international DNA database consortium
>>(GenBank, EMBL and DDBJ). Science's statement regarding the agreement
>>is at:
>>http://www.sciencemag.org/feature/data/announcement/genomesequenceplan.shl
>>
>>All major journals, including Science, have a policy of deposition of
>>sequence data with the "appropriate data bank". The accepted community
>>standard is submission to GenBank/EMBL/DDBJ. The reason for this
>>deposition is to make the results of the work openly available for
>>future research. This principle was specifically mentioned in the
>>Clinton/Blair statement on human genome sequencing -
>> http://www.usinfo.state.gov/topical/global/biotech/00031401.htm
>>- - who strongly upheld the view that "unencumbered access" to genome
>>data was critical.
>>
>>The terms of the Celera/Science agreement will give us access to the
>>genome sequence, but not unencumbered access.  Celera is suggesting
>>publishing their data under a MTA (Material Transfer Agreement) which
>>would prevent large scale downloads and incorporation of this data
>>into GenBank/EMBL/DDBJ. In order to download the data, you and your
>>institution will have to sign a contract guaranteeing that you will
>>not "redistribute" the Celera data.
>>
>>Science believes that the deal is an adequate compromise because it
>>provides us the right to download the data and publish our results.
>>We believe Science is thinking in terms of single gene biology, not
>>large scale bioinformatics. It is probably not hard for you to imagine
>>scenarios in bioinformatics in which "publication" and
>>"redistribution" are virtually the same thing; we cannot imagine
>>Celera allowing us to incorporate data into Pfam, for example,
>>nor into Ensembl.
>>
>>We are asking for your support in writing to Science to politely
>>insist that genome sequence papers should be accompanied by
>>unencumbered deposition to GenBank/EMBL/DDBJ. Please note that we have
>>no issue with Celera either keeping this data unpublished for
>>commercial reasons, nor with them combining their data with freely
>>available data from the public genome projects. We would defend their
>>right to do either. Our view is simply that the genome community has
>>established a clear principle that published genome data must be
>>deposited in the international databases, that bioinformatics is
>>fueled by this principle, and that Science therefore threatens to set
>>a precedent that undermines our research.
>>
>>We encourage you to express your views on this matter to Donald
>>Kennedy (kennedyd@kennedyd.pobox.stanford.edu), the Editor-in-Chief of
>>Science, and/or to Barbara Jasny (bjasny@aaas.org), the managing
>>editor in charge of genomics papers at Science.
>>
>>
>>Here is a Q/A about some points.
>>
>>* Why does this matter?
>>
>>A classic example of how our field began to have an impact on
>>molecular biology was Russ Doolittle's discovery of a significant
>>sequence similarity between a viral oncogene and a cellular growth
>>factor receptor. Russ could not have found that result if he did not
>>have an aggregate database of previously published sequences. We have
>>come a long way from Russ and his son typing data into the NEWAT
>>protein sequence database by hand.
>>
>>Throughout the 80's the international database community fought hard
>>to insist that DNA sequence data be deposited into the public domain
>>databases. Journals now generally require deposition as a condition of
>>accepting a paper. The forming of these databases and the
>>international agreements on data sharing between the European,
>>American and Japanase databases fostered the rapid development of
>>bioinformatics research. We now all take for granted the fact that
>>large DNA databases are accessible from a single point of contact, and
>>the identifiers are coordinated worldwide.
>>
>>Bioinformatics research relies on open data with minimal legal
>>encumberances submitted to public databases. Without these databases
>>there is no real substrate for bioinformatics research.
>>
>>
>>* What would happen if this precedent was set?
>>
>>There are a number of consequences if Science set a precedent that
>>allowed people to publish DNA data under a variety of MTAs.
>>
>>- - One would not be able to form a single DNA database on which to
>>  do bioinformatics research, and the derivative databases (Swissprot,
>>  PIR, Pfam, PROSITE, etc.) would not be legal.
>>
>>- - Bench biologists would have to visit a number of websites and
>>  possibly enter into a number of different contracts for access to DNA
>>  data. Unexpected informative homologies could become prohibitively
>>  difficult to find.
>>
>>- - You may need to get a legal review before you can publish
>>  the results of an analysis, if your analysis is large-scale and
>>  detailed enough that it could be reasonably interpreted as a
>>  "redistribution" of the primary sequence data. You could
>>  be sued for breach of contract for a Web Supplement page
>>  that discloses extensive sequence data supporting your results.
>>
>>- - Scientific openness will be undermined. Efforts to engage the
>>  community in cooperative annotation of large genomes, for instance,
>>  would be blocked -- we can't usefully annotate a genome we can't freely
>>  redistribute.
>>
>>
>>* Celera paid for it. Can't they set their own access terms?
>>
>>Absolutely. We have no issue with Celera's commercial data gathering,
>>and their right to set their own access terms to their data.  We do
>>feel, though, that scientific publications carry a certain ethical
>>responsibility. The purpose of a paper is to enable the community to
>>efficiently build on your work. There is always a tension between
>>disclosing your work to your competitors (this is not unique to
>>private companies!) and receiving scientific credit for your work via
>>publication.  This tension is natural, and maintaining a consistent
>>and acceptable balance is the reason that scientist and journals
>>establish community standards that dictate how data are required to be
>>disclosed. In this case, the clearly accepted community standard is
>>that DNA sequence data are deposited in Genbank/EMBL/DDBJ upon
>>publication.
>>
>>We certainly do not blame Celera (much) for seeking a special deal
>>that lets them have their cake and eat it too -- they would
>>understandably like scientific credit for their terrific and important
>>work in human sequencing, and they would also like a profitable
>>business model.
>>
>>We do blame Science for failing to take a strong stand in upholding
>>accepted scientific publication practices. We cannot accept that it is
>>necessary to sacrifice ethics for expediency.
>>
>>* Science claims they are honouring their own policy. What gives?
>>
>>Science now claims that all their policy really requires is that
>>archival data be available via a publicly accessible database.  We
>>think this is a conveniently revisionist view of their own policy,
>>which states (in Instructions to Authors):
>>
>>"archival data sets (such as sequence and structural data) must be
>>deposited with the appropriate data bank and the identifier code should be
>>sent to Science for inclusion in the published manuscript (coordinates
>>must be released at the time of publication)"
>>
>>Notice the use of the definitive article "THE appropiate data bank",
>>the notion of "deposition", and the additional rider that the
>>identifier code should be sent.
>>
>>The spirit of this statement seems clear to us. Science's statement
>>anticipates that there is an appropriate, single, aggregrate community
>>database for each sort of archival data, whether DNA sequence, protein
>>structure coordinates, or something else. Sensibly, they don't name
>>every possible database for every possible archival data set.  They
>>expect that recognized community standards exist. In no way does
>>Science's statement seem consistent with the view that an individual
>>lab could start its own "public" DNA sequence database and send a
>>meaningless internal database identifier; to try to read it that way
>>is a post hoc rationalisation.
>>
>>
>>*  What can Science do? This is a done deal.
>>
>>It's true that this is a done deal. Science and Celera have mutually
>>agreed to the general terms of data release. But there are two ways
>>that we can minimize the damage.
>>
>>First, the details of the agreement are not set. In particular, there
>>is no definition of allowed "publication" versus prohibited
>>"redistribution". Science could specify definitions that did not
>>interfere with noncommercial uses of the data in bioinformatics,
>>allowing us redistribution rights if it made sense in the context of
>>our project (for example, a genome annotation project like Ensembl).
>>
>>Second, and preferably, Science -- or even the peer reviewers -- can
>>uphold Science's own data access policy, and reject the paper.
>>
>>Incidentally, they might also choose to enforce Science's policy on
>>prior publication, which states "...the main findings of a paper
>>should not have been reported in the mass media. Authors are, however,
>>permitted to present their data at open meetings but should not
>>overtly seek media attention." If I issued a press release upon
>>submission of a manuscript to Science, like Celera did, Science would
>>rightly fire it back to me without review.
>>
>>* What can I do?
>>
>>Agitate. Let Science know that you care. They consider this deal to be
>>a trial balloon for future genome papers. Even if we can't change the
>>deal with Celera, we can try to make sure it's a one-time-only deal
>>that's viewed as a Big Mistake. Write a letter to Science and tell
>>them how their actions would impact your research, both in the long
>>term and in the short term. Also, you can pass on this open letter to
>>other bioinformatics researchers you know.
>>
>>
>>Dr Sean Eddy,
>>Alvin Goldfarb Professor of Computational Biology,
>>Howard Hughes Medical Institute, Washington University in St. Louis, USA
>>
>>Dr Ewan Birney
>>Team Leader, Genomic Annotation
>>European Bioinformatics Institute, UK
>>
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l@bioperl.org
>>http://bioperl.org/mailman/listinfo/bioperl-l


Unite de Genetique Moleculaire
et Biologie du Developpement
CNRS ERS 1984  - 7-19 rue Guy Moquet
BP 8 - 94801 VILLEJUIF CEDEX - FRANCE
Tel : 33 (0)1 49 58 34 98 - Fax : 33 (0)1 49 58 35 09
E-mail : auffray@infobiogen.fr