From katel@worldpath.net Fri Dec 1 07:00:43 2000 Date: Thu, 30 Nov 2000 23:00:43 -0800 From: Cayte katel@worldpath.net Subject: [BioPython] plans for next release
----- Original Message ----- From: "Brad Chapman" <chapmanb@arches.uga.edu> To: "Cayte" <katel@worldpath.net> Cc: "Jeffrey Chang" <jchang@smi.stanford.edu>; <biopython@biopython.org> Sent: Wednesday, November 29, 2000 5:20 PM Subject: Re: [BioPython] plans for next release > Jeff: > > > Thomas Sicheritz-Ponten is working on visualization with his xbbtools, > > > using Tk instead of wx, though. > > > > > Cayte: > > I'm not sure its supported on Windows. The web description mentioned > > Linux but didn't mention Windows. It imports posix. > > Hi Cayte -- I just looked at this using my little bit of Windows/python > knowledge. It looks like Thomas is importing posix/posixpath in all of > the different modules in xbbtools, but never actually using them. I > commented out all of the imports, installed Pmw, and xbbtools seems to > run okay on the machine I was playing on (vanilla Windows 98). > > I'm not sure if there is a good reason to import the posix stuff > (Thomas?) but maybe if not, it might be good to get rid of the imports > so it'll run nicely on Windows as well. > I ran it and it displayed a Fasta file. But when I selected Translations->6 Frames, it caused this trace. C:\biopython-0.90d04\Scripts\xbbtools>python xbbtools.py lupine.nu Exception in Tkinter callback Traceback (most recent call last): File "c:\python20\lib\lib-tk\Tkinter.py", line 1287, in __call__ return apply(self.func, args) File ".\xbb_widget.py", line 333, in gcframe np = NotePad() File ".\xbb_utils.py", line 24, in __init__ self.tid = Pmw.ScrolledText(self) AttributeError: ScrolledText The menu items barely show up, because in Windows, they're xdark green on black, unless thry're highlighted. It would be efficient to coordinate different GUIs if its feasable? On the other hand, if I experiment, I don't want to mess up someone elses' code. CayteFrom thomas@cbs.dtu.dk Sat Dec 2 12:26:57 2000 Date: 02 Dec 2000 13:26:57 +0100 From: Thomas Sicheritz-Ponten thomas@cbs.dtu.dk Subject: [BioPython] plans for next release
> > > Jeff: > > > > Thomas Sicheritz-Ponten is working on visualization with his xbbtools, > > > > using Tk instead of wx, though. > > > > > > > > Cayte: > > > I'm not sure its supported on Windows. The web description mentioned > > > Linux but didn't mention Windows. It imports posix. > > > > Hi Cayte -- I just looked at this using my little bit of Windows/python > > knowledge. It looks like Thomas is importing posix/posixpath in all of > > the different modules in xbbtools, but never actually using them. I > > commented out all of the imports, installed Pmw, and xbbtools seems to > > run okay on the machine I was playing on (vanilla Windows 98). > > > > I'm not sure if there is a good reason to import the posix stuff > > (Thomas?) but maybe if not, it might be good to get rid of the imports > > so it'll run nicely on Windows as well. > > > I ran it and it displayed a Fasta file. But when I selected > Translations->6 Frames, it caused this trace. > > C:\biopython-0.90d04\Scripts\xbbtools>python xbbtools.py lupine.nu > Exception in Tkinter callback > Traceback (most recent call last): > File "c:\python20\lib\lib-tk\Tkinter.py", line 1287, in __call__ > return apply(self.func, args) > File ".\xbb_widget.py", line 333, in gcframe > np = NotePad() > File ".\xbb_utils.py", line 24, in __init__ > self.tid = Pmw.ScrolledText(self) > AttributeError: ScrolledText > > The menu items barely show up, because in Windows, they're xdark green on > black, unless thry're highlighted. > > It would be efficient to coordinate different GUIs if its feasable? On > the other hand, if I experiment, I don't want to mess up someone elses' > code. Ok - I just came back from egypt. Of course there is no need at all for using posix.posixpath - thats still left from my novice days :-) I fix that and try to remove all Pmw widgets (its easy to implement the scrolled things in pure Tk) I do not run python in windows, so could anybody send me correct color configurations for windows ? I have not worked on xbbtools for a time because of the question about how to start one or more blast searches (threaded ?) without freezing the whole Tk mainloop until the blast run is finished. To start a search (Blast,Fasta Clustal etc) from within biopython is definitely a frequent task for which there should exist a seperate module/class ... IMHO c ya -thomas -- Sicheritz Ponten Thomas E. CBS, Department of Biotechnology thomas@biopython.org The Technical University of Denmark CBS: +45 45 252489 Building 208, DK-2800 Lyngby Fax +45 45 931585 http://www.cbs.dtu.dk/thomas De Chelonian Mobile ... The Turtle Moves ...From katel@worldpath.net Sun Dec 3 05:48:20 2000 Date: Sat, 2 Dec 2000 21:48:20 -0800 From: Cayte katel@worldpath.net Subject: [BioPython] plans for next release
----- Original Message ----- From: "Thomas Sicheritz-Ponten" <thomas@cbs.dtu.dk> To: "Cayte" <katel@worldpath.net> Cc: "Brad Chapman" <chapmanb@arches.uga.edu>; "Jeffrey Chang" <jchang@smi.stanford.edu>; <biopython@biopython.org> Sent: Saturday, December 02, 2000 4:26 AM Subject: Re: [BioPython] plans for next release > > > > > Jeff: > > > > > Thomas Sicheritz-Ponten is working on visualization with his xbbtools, > > > > > using Tk instead of wx, though. > > > > > > > > > > > Cayte: > > > > I'm not sure its supported on Windows. The web description mentioned > > > > Linux but didn't mention Windows. It imports posix. > > > > > > Hi Cayte -- I just looked at this using my little bit of Windows/python > > > knowledge. It looks like Thomas is importing posix/posixpath in all of > > > the different modules in xbbtools, but never actually using them. I > > > commented out all of the imports, installed Pmw, and xbbtools seems to > > > run okay on the machine I was playing on (vanilla Windows 98). > > > > > > I'm not sure if there is a good reason to import the posix stuff > > > (Thomas?) but maybe if not, it might be good to get rid of the imports > > > so it'll run nicely on Windows as well. > > > > > I ran it and it displayed a Fasta file. But when I selected > > Translations->6 Frames, it caused this trace. > > > > C:\biopython-0.90d04\Scripts\xbbtools>python xbbtools.py lupine.nu > > Exception in Tkinter callback > > Traceback (most recent call last): > > File "c:\python20\lib\lib-tk\Tkinter.py", line 1287, in __call__ > > return apply(self.func, args) > > File ".\xbb_widget.py", line 333, in gcframe > > np = NotePad() > > File ".\xbb_utils.py", line 24, in __init__ > > self.tid = Pmw.ScrolledText(self) > > AttributeError: ScrolledText > > > > The menu items barely show up, because in Windows, they're xdark green on > > black, unless thry're highlighted. > > > > It would be efficient to coordinate different GUIs if its feasable? On > > the other hand, if I experiment, I don't want to mess up someone elses' > > code. > > Ok - I just came back from egypt. Of course there is no need at all for > using posix.posixpath - thats still left from my novice days :-) > I fix that and try to remove all Pmw widgets (its easy to implement the > scrolled things in pure Tk) > > I do not run python in windows, so could anybody send me correct color > configurations for windows ? > I have not worked on xbbtools for a time because of the question about how > to start one or more blast searches (threaded ?) without freezing the whole > Tk mainloop until the blast run is finished. To start a search (Blast,Fasta > Clustal etc) from within biopython is definitely a frequent task for which > there should exist a seperate module/class ... IMHO > > c ya > -thomas > > -- > Sicheritz Ponten Thomas E. CBS, Department of Biotechnology > thomas@biopython.org The Technical University of Denmark > CBS: +45 45 252489 Building 208, DK-2800 Lyngby > Fax +45 45 931585 http://www.cbs.dtu.dk/thomas > > De Chelonian Mobile ... The Turtle Moves ... > > _______________________________________________ > BioPython mailing list - BioPython@biopython.org > http://biopython.org/mailman/listinfo/biopython > >From katel@worldpath.net Sun Dec 3 05:56:59 2000 Date: Sat, 2 Dec 2000 21:56:59 -0800 From: Cayte katel@worldpath.net Subject: [BioPython] plans for next release
> Ok - I just came back from egypt. Of course there is no need at all for > using posix.posixpath - thats still left from my novice days :-) > I fix that and try to remove all Pmw widgets (its easy to implement the > scrolled things in pure Tk) > Egypt sounds like a fascinating place to visit! Late fall sounds like the right time, too, with cooler weather. Have you ever considered wxPython? I have a tool, SeqGui.py, in wxPython, that's sort of like xbbtools.py I find it easier to work with than Tkinter. Back in May, we had a thread about Gui support. Its in the archives. I'd like the gui to eventually support color highlighting of features, for example, regions of high consensus. CayteFrom thomas@cbs.dtu.dk Tue Dec 5 07:53:58 2000 Date: Tue, 5 Dec 2000 08:53:58 +0100 (CET) From: thomas@cbs.dtu.dk thomas@cbs.dtu.dk Subject: [BioPython] Re: plans for next release
> thomas wrote > > Ok - I just came back from egypt. Of course there is no need at all for > > using posix.posixpath - thats still left from my novice days :-) > > I fix that and try to remove all Pmw widgets (its easy to implement the > > scrolled things in pure Tk) > > > Cayte wrote > > Egypt sounds like a fascinating place to visit! Late fall sounds like the > right time, too, with cooler weather. and nice snorkling too :-) > > Have you ever considered wxPython? I have a tool, SeqGui.py, in wxPython, > that's sort of like xbbtools.py I find it easier to work with than Tkinter. > Back in May, we had a thread about Gui support. Its in the archives. ################# # It seems that my reply didn't make it through sendmail :-( - I try to # reconstruct ... ######### The main reasons for my sticking to Tkinter are the fact that I have used Tcl/Tk a lot before I discovered python - I have tons of Tk snippets from my previous bioinformatic work (Biowish, GRS, XBbtools, CapDB etc.) which is very easy to convert into shorter, cleaner and more efficient python Tk code. Maybe the biggest advantage in using Tkinter is the powerful Tk Canvas, as far as I know neither wxPython or Gtk python have anything close to the canvas widget. > > I'd like the gui to eventually support color highlighting of features, for > example, regions of high consensus. > I don't know how this works in wxPython, but in Tkinter it is already there from the beginning. Every line, rectangle etc. you draw in the canvas is an unique object and gets an id. You can very easy bind any event (e.g. MouseOver, DoubleClickButton1 etc.) to any function. To highlight different genes or sequence regions is just to group the according id's and bind a color-change on a MouseOver event. e.g. my recently accepted paper about phylogenomics with python (NAR nr2 2001) deals with the interactive display of all genes, phylogenetic trees, blast results for a microbial genome (between 1000 and 5000 times 3). I have no fancy webpage yet but you can check a screenshot of the phylome of the Bacteria Thermotoga maritima at http://www.cbs.dtu.dk/thomas/pyphy/pyphy.png (Phylome = set of all phylogenetic trees for a genome. color coding for the kingdom of the closest neighbor in the phylogenetic tree: blue = Bacteria, yellow = Archaea, red = Eukarya) Here the phylome map is an interactive display of all phylogenetic trees and genes (colored lines in the circle), where each line/gene is sensitive to mouse movement. A MouseOver event displays gene information in the top Entry, Button1Click shows the phylogenetic tree, Button3 shows a gene specific popupmenu for blastresults, alignments etc. Each gene can be a member of a metabolic pathway, where selecting a pathway in the right listbox changes the width and the arrow shape of each gene associated (canvas tag) with the pathway. The advantage here is zooming, resizing, moving and event grabbing is part of the canvas widget so we only need to redraw single objects. I have never worked with wxPython - what is exactly the strength of wxWindows ? I guess it is faster than Tkinter, are there any special features not found in the rest of the GUI family ? c ya -thomas Sicheritz Ponten Thomas E. CBS, Department of Biotechnology thomas@biopython.org The Technical University of Denmark CBS: +45 45 252485 Building 208, DK-2800 Lyngby Fax +45 45 931585 http://www.cbs.dtu.dk/thomas/index.html De Chelonian Mobile ... The Turtle Moves ...From katel@worldpath.net Wed Dec 6 06:40:50 2000 Date: Tue, 5 Dec 2000 22:40:50 -0800 From: Cayte katel@worldpath.net Subject: [BioPython] Re: [Biopython-dev] Re: plans for next release
> and nice snorkling too :-) > > Do you have pictures of Egypt to post on the web? > > Have you ever considered wxPython? I have a tool, SeqGui.py, in wxPython, > > that's sort of like xbbtools.py I find it easier to work with than Tkinter. > > Back in May, we had a thread about Gui support. Its in the archives. > > > The main reasons for my sticking to Tkinter are the fact that I have used > Tcl/Tk a lot before I discovered python - I have tons of Tk snippets from > my previous bioinformatic work (Biowish, GRS, XBbtools, CapDB etc.) which > is very easy to convert into shorter, cleaner and more efficient python Tk > code. Maybe the biggest advantage in using Tkinter is the powerful Tk > Canvas, as far as I know neither wxPython or Gtk python have anything close > to the canvas widget. > I think the Windows version is a wrapper around the Windows Gui and that wxPython attempts to provide equivalent functionality in Linux. > > > > I'd like the gui to eventually support color highlighting of features, for > > example, regions of high consensus. > > > > I don't know how this works in wxPython, but in Tkinter it is already > there from the beginning. Every line, rectangle etc. you draw in the > canvas is an unique object and gets an id. You can very easy bind any event > (e.g. MouseOver, DoubleClickButton1 etc.) to any function. To highlight > different genes or sequence regions is just to group the according id's and > bind a color-change on a MouseOver event. > e.g. my recently accepted paper about phylogenomics with python (NAR nr2 > 2001) deals with the interactive display of all genes, phylogenetic > trees, blast results for a microbial genome (between 1000 and 5000 times > 3). > I have no fancy webpage yet but you can check a screenshot of the > phylome of the Bacteria Thermotoga maritima > at http://www.cbs.dtu.dk/thomas/pyphy/pyphy.png > (Phylome = set of all phylogenetic trees for a genome. > color coding for the kingdom of the closest neighbor in the phylogenetic > tree: blue = Bacteria, yellow = Archaea, red = Eukarya) > > Here the phylome map is an interactive display of all phylogenetic trees > and genes (colored lines in the circle), where each line/gene is sensitive > to mouse movement. A MouseOver event displays gene information in the top > Entry, Button1Click shows the phylogenetic tree, Button3 shows a gene > specific popupmenu for blastresults, alignments etc. > Each gene can be a member of a metabolic pathway, where selecting a pathway > in the right listbox changes the width and the arrow shape of each gene > associated (canvas tag) with the pathway. > > The advantage here is zooming, resizing, moving and event grabbing is part > of the canvas widget so we only need to redraw single objects. > Does it support colorization with enough flexibility, to support research on the fly, as in this scenario? USER STORY: Ed Enzyme is doing some detective work on an alignment. First he highlights the start and stop codons in red and green. Then Ed zooms in on an interesting sequence. He first highights the hydrophilic regions in magenta. Then Ed backtracks and highlights the acidic regions. > > I have never worked with wxPython - what is exactly the strength of > wxWindows ? I guess it is faster than Tkinter, are there any special > features not found in the rest of the GUI family ? > > I found it was easier to work with. With wxPython I could write more code in the same time and fewer problems, like panels that don't quite line up. CayteFrom birney@ebi.ac.uk Sat Dec 9 19:03:10 2000 Date: Sat, 9 Dec 2000 19:03:10 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [BioPython] An open letter to bioinformatcis researchers
Dear fellow bioinformatics developers: By now you have probably heard that Celera Genomics has submitted their human genome paper to the journal Science. Science and Celera have agreed to special terms for the release of the human genome sequence data. It will be made available through the Celera website, and will not be submitted to the international DNA database consortium (GenBank, EMBL and DDBJ). Science's statement regarding the agreement is at: http://www.sciencemag.org/feature/data/announcement/genomesequenceplan.shl All major journals, including Science, have a policy of deposition of sequence data with the "appropriate data bank". The accepted community standard is submission to GenBank/EMBL/DDBJ. The reason for this deposition is to make the results of the work openly available for future research. This principle was specifically mentioned in the Clinton/Blair statement on human genome sequencing - http://www.usinfo.state.gov/topical/global/biotech/00031401.htm - who strongly upheld the view that "unencumbered access" to genome data was critical. The terms of the Celera/Science agreement will give us access to the genome sequence, but not unencumbered access. Celera is suggesting publishing their data under a MTA (Material Transfer Agreement) which would prevent large scale downloads and incorporation of this data into GenBank/EMBL/DDBJ. In order to download the data, you and your institution will have to sign a contract guaranteeing that you will not "redistribute" the Celera data. Science believes that the deal is an adequate compromise because it provides us the right to download the data and publish our results. We believe Science is thinking in terms of single gene biology, not large scale bioinformatics. It is probably not hard for you to imagine scenarios in bioinformatics in which "publication" and "redistribution" are virtually the same thing; we cannot imagine Celera allowing us to incorporate data into Pfam, for example, nor into Ensembl. We are asking for your support in writing to Science to politely insist that genome sequence papers should be accompanied by unencumbered deposition to GenBank/EMBL/DDBJ. Please note that we have no issue with Celera either keeping this data unpublished for commercial reasons, nor with them combining their data with freely available data from the public genome projects. We would defend their right to do either. Our view is simply that the genome community has established a clear principle that published genome data must be deposited in the international databases, that bioinformatics is fueled by this principle, and that Science therefore threatens to set a precedent that undermines our research. We encourage you to express your views on this matter to Donald Kennedy (kennedyd@kennedyd.pobox.stanford.edu), the Editor-in-Chief of Science, and/or to Barbara Jasny (bjasny@aaas.org), the managing editor in charge of genomics papers at Science. Here is a Q/A about some points. * Why does this matter? A classic example of how our field began to have an impact on molecular biology was Russ Doolittle's discovery of a significant sequence similarity between a viral oncogene and a cellular growth factor receptor. Russ could not have found that result if he did not have an aggregate database of previously published sequences. We have come a long way from Russ and his son typing data into the NEWAT protein sequence database by hand. Throughout the 80's the international database community fought hard to insist that DNA sequence data be deposited into the public domain databases. Journals now generally require deposition as a condition of accepting a paper. The forming of these databases and the international agreements on data sharing between the European, American and Japanase databases fostered the rapid development of bioinformatics research. We now all take for granted the fact that large DNA databases are accessible from a single point of contact, and the identifiers are coordinated worldwide. Bioinformatics research relies on open data with minimal legal encumberances submitted to public databases. Without these databases there is no real substrate for bioinformatics research. * What would happen if this precedent was set? There are a number of consequences if Science set a precedent that allowed people to publish DNA data under a variety of MTAs. - One would not be able to form a single DNA database on which to do bioinformatics research, and the derivative databases (Swissprot, PIR, Pfam, PROSITE, etc.) would not be legal. - Bench biologists would have to visit a number of websites and possibly enter into a number of different contracts for access to DNA data. Unexpected informative homologies could become prohibitively difficult to find. - You may need to get a legal review before you can publish the results of an analysis, if your analysis is large-scale and detailed enough that it could be reasonably interpreted as a "redistribution" of the primary sequence data. You could be sued for breach of contract for a Web Supplement page that discloses extensive sequence data supporting your results. - Scientific openness will be undermined. Efforts to engage the community in cooperative annotation of large genomes, for instance, would be blocked -- we can't usefully annotate a genome we can't freely redistribute. * Celera paid for it. Can't they set their own access terms? Absolutely. We have no issue with Celera's commercial data gathering, and their right to set their own access terms to their data. We do feel, though, that scientific publications carry a certain ethical responsibility. The purpose of a paper is to enable the community to efficiently build on your work. There is always a tension between disclosing your work to your competitors (this is not unique to private companies!) and receiving scientific credit for your work via publication. This tension is natural, and maintaining a consistent and acceptable balance is the reason that scientist and journals establish community standards that dictate how data are required to be disclosed. In this case, the clearly accepted community standard is that DNA sequence data are deposited in Genbank/EMBL/DDBJ upon publication. We certainly do not blame Celera (much) for seeking a special deal that lets them have their cake and eat it too -- they would understandably like scientific credit for their terrific and important work in human sequencing, and they would also like a profitable business model. We do blame Science for failing to take a strong stand in upholding accepted scientific publication practices. We cannot accept that it is necessary to sacrifice ethics for expediency. * Science claims they are honouring their own policy. What gives? Science now claims that all their policy really requires is that archival data be available via a publicly accessible database. We think this is a conveniently revisionist view of their own policy, which states (in Instructions to Authors): "archival data sets (such as sequence and structural data) must be deposited with the appropriate data bank and the identifier code should be sent to Science for inclusion in the published manuscript (coordinates must be released at the time of publication)" Notice the use of the definitive article "THE appropiate data bank", the notion of "deposition", and the additional rider that the identifier code should be sent. The spirit of this statement seems clear to us. Science's statement anticipates that there is an appropriate, single, aggregrate community database for each sort of archival data, whether DNA sequence, protein structure coordinates, or something else. Sensibly, they don't name every possible database for every possible archival data set. They expect that recognized community standards exist. In no way does Science's statement seem consistent with the view that an individual lab could start its own "public" DNA sequence database and send a meaningless internal database identifier; to try to read it that way is a post hoc rationalisation. * What can Science do? This is a done deal. It's true that this is a done deal. Science and Celera have mutually agreed to the general terms of data release. But there are two ways that we can minimize the damage. First, the details of the agreement are not set. In particular, there is no definition of allowed "publication" versus prohibited "redistribution". Science could specify definitions that did not interfere with noncommercial uses of the data in bioinformatics, allowing us redistribution rights if it made sense in the context of our project (for example, a genome annotation project like Ensembl). Second, and preferably, Science -- or even the peer reviewers -- can uphold Science's own data access policy, and reject the paper. Incidentally, they might also choose to enforce Science's policy on prior publication, which states "...the main findings of a paper should not have been reported in the mass media. Authors are, however, permitted to present their data at open meetings but should not overtly seek media attention." If I issued a press release upon submission of a manuscript to Science, like Celera did, Science would rightly fire it back to me without review. * What can I do? Agitate. Let Science know that you care. They consider this deal to be a trial balloon for future genome papers. Even if we can't change the deal with Celera, we can try to make sure it's a one-time-only deal that's viewed as a Big Mistake. Write a letter to Science and tell them how their actions would impact your research, both in the long term and in the short term. Also, you can pass on this open letter to other bioinformatics researchers you know. Dr Sean Eddy, Alvin Goldfarb Professor of Computational Biology, Howard Hughes Medical Institute, Washington University in St. Louis, USA Dr Ewan Birney Team Leader, Genomic Annotation European Bioinformatics Institute, UKFrom birney@ebi.ac.uk Sun Dec 10 13:44:31 2000 Date: Sun, 10 Dec 2000 13:44:31 +0000 (GMT) From: Ewan Birney birney@ebi.ac.uk Subject: [BioPython] Update on Don Kennedy's address.
The address for don kennedy we gave out in our letter kennedyd@kennedyd.pobox.stanford.edu seems to bounce. kennedyd@stanford.edu seems not to bounce (hopefully because it is getting delivered) ----------------------------------------------------------------- Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420 <birney@ebi.ac.uk>. -----------------------------------------------------------------From jmh.neefs@pandora.be Sun Dec 10 21:25:13 2000 Date: Sun, 10 Dec 2000 22:25:13 +0100 From: Jean-Marc Neefs jmh.neefs@pandora.be Subject: [BioPython] RE: An open letter to bioinformatcis researchers
Dear Ewan and Sean, I would like to add my 2 cents. The biggest trouble with hiding the genome is keeping information away from the scientists, and slowing down research. On the other hand, playing devil's advocate, this could be another call to the public effort for quicker finishing. Anyway, Celera only has a small window of opportunity before the public data become available, and we all will have enough laboratory work to analyse and confirm the coming data deluge. To end on a positive note: keep up the good work on Ensembl. I learn more and more about it each day and find it more and more useful. I will contact Science. Kind Regards, Jean-Marc Neefs Senior Bioinformatics Scientist -----Original Message----- From: Ewan Birney [SMTP:birney@ebi.ac.uk] Sent: Saturday, December 09, 2000 8:03 PM To: bioperl-l@bioperl.org; biojava-l@biojava.org; biopython@biopython.org; bioxml-dev@bioxml.org; ensembl-dev@ebi.ac.uk; apollo@ebi.ac.uk Subject: An open letter to bioinformatcis researchers Dear fellow bioinformatics developers: By now you have probably heard that Celera Genomics has submitted their human genome paper to the journal Science. Science and Celera have agreed to special terms for the release of the human genome sequence data. It will be made available through the Celera website, and will not be submitted to the international DNA database consortium (GenBank, EMBL and DDBJ). Science's statement regarding the agreement is at: http://www.sciencemag.org/feature/data/announcement/genomesequenceplan.shl All major journals, including Science, have a policy of deposition of sequence data with the "appropriate data bank". The accepted community standard is submission to GenBank/EMBL/DDBJ. The reason for this deposition is to make the results of the work openly available for future research. This principle was specifically mentioned in the Clinton/Blair statement on human genome sequencing - http://www.usinfo.state.gov/topical/global/biotech/00031401.htm - who strongly upheld the view that "unencumbered access" to genome data was critical. The terms of the Celera/Science agreement will give us access to the genome sequence, but not unencumbered access. Celera is suggesting publishing their data under a MTA (Material Transfer Agreement) which would prevent large scale downloads and incorporation of this data into GenBank/EMBL/DDBJ. In order to download the data, you and your institution will have to sign a contract guaranteeing that you will not "redistribute" the Celera data. Science believes that the deal is an adequate compromise because it provides us the right to download the data and publish our results. We believe Science is thinking in terms of single gene biology, not large scale bioinformatics. It is probably not hard for you to imagine scenarios in bioinformatics in which "publication" and "redistribution" are virtually the same thing; we cannot imagine Celera allowing us to incorporate data into Pfam, for example, nor into Ensembl. We are asking for your support in writing to Science to politely insist that genome sequence papers should be accompanied by unencumbered deposition to GenBank/EMBL/DDBJ. Please note that we have no issue with Celera either keeping this data unpublished for commercial reasons, nor with them combining their data with freely available data from the public genome projects. We would defend their right to do either. Our view is simply that the genome community has established a clear principle that published genome data must be deposited in the international databases, that bioinformatics is fueled by this principle, and that Science therefore threatens to set a precedent that undermines our research. We encourage you to express your views on this matter to Donald Kennedy (kennedyd@kennedyd.pobox.stanford.edu), the Editor-in-Chief of Science, and/or to Barbara Jasny (bjasny@aaas.org), the managing editor in charge of genomics papers at Science. Here is a Q/A about some points. * Why does this matter? A classic example of how our field began to have an impact on molecular biology was Russ Doolittle's discovery of a significant sequence similarity between a viral oncogene and a cellular growth factor receptor. Russ could not have found that result if he did not have an aggregate database of previously published sequences. We have come a long way from Russ and his son typing data into the NEWAT protein sequence database by hand. Throughout the 80's the international database community fought hard to insist that DNA sequence data be deposited into the public domain databases. Journals now generally require deposition as a condition of accepting a paper. The forming of these databases and the international agreements on data sharing between the European, American and Japanase databases fostered the rapid development of bioinformatics research. We now all take for granted the fact that large DNA databases are accessible from a single point of contact, and the identifiers are coordinated worldwide. Bioinformatics research relies on open data with minimal legal encumberances submitted to public databases. Without these databases there is no real substrate for bioinformatics research. * What would happen if this precedent was set? There are a number of consequences if Science set a precedent that allowed people to publish DNA data under a variety of MTAs. - One would not be able to form a single DNA database on which to do bioinformatics research, and the derivative databases (Swissprot, PIR, Pfam, PROSITE, etc.) would not be legal. - Bench biologists would have to visit a number of websites and possibly enter into a number of different contracts for access to DNA data. Unexpected informative homologies could become prohibitively difficult to find. - You may need to get a legal review before you can publish the results of an analysis, if your analysis is large-scale and detailed enough that it could be reasonably interpreted as a "redistribution" of the primary sequence data. You could be sued for breach of contract for a Web Supplement page that discloses extensive sequence data supporting your results. - Scientific openness will be undermined. Efforts to engage the community in cooperative annotation of large genomes, for instance, would be blocked -- we can't usefully annotate a genome we can't freely redistribute. * Celera paid for it. Can't they set their own access terms? Absolutely. We have no issue with Celera's commercial data gathering, and their right to set their own access terms to their data. We do feel, though, that scientific publications carry a certain ethical responsibility. The purpose of a paper is to enable the community to efficiently build on your work. There is always a tension between disclosing your work to your competitors (this is not unique to private companies!) and receiving scientific credit for your work via publication. This tension is natural, and maintaining a consistent and acceptable balance is the reason that scientist and journals establish community standards that dictate how data are required to be disclosed. In this case, the clearly accepted community standard is that DNA sequence data are deposited in Genbank/EMBL/DDBJ upon publication. We certainly do not blame Celera (much) for seeking a special deal that lets them have their cake and eat it too -- they would understandably like scientific credit for their terrific and important work in human sequencing, and they would also like a profitable business model. We do blame Science for failing to take a strong stand in upholding accepted scientific publication practices. We cannot accept that it is necessary to sacrifice ethics for expediency. * Science claims they are honouring their own policy. What gives? Science now claims that all their policy really requires is that archival data be available via a publicly accessible database. We think this is a conveniently revisionist view of their own policy, which states (in Instructions to Authors): "archival data sets (such as sequence and structural data) must be deposited with the appropriate data bank and the identifier code should be sent to Science for inclusion in the published manuscript (coordinates must be released at the time of publication)" Notice the use of the definitive article "THE appropiate data bank", the notion of "deposition", and the additional rider that the identifier code should be sent. The spirit of this statement seems clear to us. Science's statement anticipates that there is an appropriate, single, aggregrate community database for each sort of archival data, whether DNA sequence, protein structure coordinates, or something else. Sensibly, they don't name every possible database for every possible archival data set. They expect that recognized community standards exist. In no way does Science's statement seem consistent with the view that an individual lab could start its own "public" DNA sequence database and send a meaningless internal database identifier; to try to read it that way is a post hoc rationalisation. * What can Science do? This is a done deal. It's true that this is a done deal. Science and Celera have mutually agreed to the general terms of data release. But there are two ways that we can minimize the damage. First, the details of the agreement are not set. In particular, there is no definition of allowed "publication" versus prohibited "redistribution". Science could specify definitions that did not interfere with noncommercial uses of the data in bioinformatics, allowing us redistribution rights if it made sense in the context of our project (for example, a genome annotation project like Ensembl). Second, and preferably, Science -- or even the peer reviewers -- can uphold Science's own data access policy, and reject the paper. Incidentally, they might also choose to enforce Science's policy on prior publication, which states "...the main findings of a paper should not have been reported in the mass media. Authors are, however, permitted to present their data at open meetings but should not overtly seek media attention." If I issued a press release upon submission of a manuscript to Science, like Celera did, Science would rightly fire it back to me without review. * What can I do? Agitate. Let Science know that you care. They consider this deal to be a trial balloon for future genome papers. Even if we can't change the deal with Celera, we can try to make sure it's a one-time-only deal that's viewed as a Big Mistake. Write a letter to Science and tell them how their actions would impact your research, both in the long term and in the short term. Also, you can pass on this open letter to other bioinformatics researchers you know. Dr Sean Eddy, Alvin Goldfarb Professor of Computational Biology, Howard Hughes Medical Institute, Washington University in St. Louis, USA Dr Ewan Birney Team Leader, Genomic Annotation European Bioinformatics Institute, UKFrom auffray@infobiogen.fr Mon Dec 18 12:19:48 2000 Date: Mon, 18 Dec 2000 13:19:48 +0100 From: Charles Auffray auffray@infobiogen.fr Subject: [BioPython] Human genome sequence
Ewan, Let me express my reaction to your mail. The Science-Celera deal is not, on many accounts, a precedent. In 1995, Nature published in its Genome Directory a paper by Adams et al. under an agreement which already was breaching many of the commonly accepted rules for publication and sequence data deposition. The TIGR group, led by Craig Venter, had included data released in public databases by other groups without releasing much of their own data, which at that time was only accessible through an MTA (based on their relationship with Human Genome Science). My Genexpress team was declined the possibility of publishing our interpretation of our own data in the same issue of Nature. The silence of the scientific community at that time was astounding (notwithstanding the fact that by some irony, Genome Research published our paper the very same day as the Genome Directory). What is happening now with publication of the human genome sequence papers seems to indicate that the lessons have not been taken from such past events, and that people have short memories. The sort of work that will lead to full description of genomes, transcriptomes and proteomes is the result of the contributions of large number of individuals over several decades. In an attempt to evaluate how many people should be cited as co-authors of an overview paper describing the state of knowledge on the human transcriptome, I ended up with a figure of 44,444 (including Venter and his co-workers), that is in the same order of magnitude as the estimated number of human genes. I believe it would be appropriate for those seeking to publish milestones papers on the current knowledge of the human genome, whether from the public or the private sector, to aknowledge all those who led the ground for this work by citing them as co-authors. As a first indication, there are 7846 papers registered in PubMed containing "human genome" in their tittle or abstract. As many scientists know, and despite all media coverage, the work is not yet completely finished, and even it it was, it would only be the end of the beginning. Such an action would have several advantages. First it would convey to the public the idea that science is a collective as well as an individual endeavour. Second, it would make clear that the sequence of the human genome is common knowledge which can be shared by all to advance human health, in line with the United Nations Declaration on the Human Genome and Human Rights which was adopted unanimously by the 186 nations represented in 1998 (http://www1.umn.edu/humanrts/instree/Udhrhg.htm). Part of the process is, as you rightly point out, the development of large-scale analyses using informatics which require enencumbered access of the primary data in the established international electronic data repositories (EMBL/NCBI/DDBJ). We also need to ensure that useful applications can be developed with appropriate financial investment and reach the end user through the healthcare system. In this respect, some level of balanced and fair competition, which occurs both within the academic or industrial sectors as well as between them, is desirable. The balance and fairness can only be achieved if we all recognize the contributions of all and provide the incentive for the required public and private investments needed. The fuzziness of the intellectual property status of inventions based on knowledge of the human genome sequence vs the human genome sequence itself does not help. In eight years, since I wrote a letter to Nature on this subject (DNA sequences. Nature. 1992 355:292), it seems to me that we have witnessed some progress in this regard, but a lot more effort is needed by the law and policy makers to clarify the situation. It is their responsability to enforce regulations disallowing attempts for monopolization, were it hrough media coverage, as seems currently fashionable. The sooner the better. There is so much to do ahead of us. Charles Auffray >>Date: Mon, 11 Dec 2000 17:55:25 +0100 >>To: crbm@crbm.cnrs-mop.fr >>From: Vincent Coulon <coulon@jones.igm.cnrs-mop.fr> >>Subject: Celera/Science agreement: les scelerats! >>X-MIME-Autoconverted: from quoted-printable to 8bit by >xerxes.crbm.cnrs-mop.fr id RAA30399 >> >>------- Forwarded Message >> >>From: Ewan Birney <birney@ebi.ac.uk> >>To: bioperl-l@bioperl.org, biojava-l@biojava.org, biopython@biopython.org, >> bioxml-dev@bioxml.org, ensembl-dev@ebi.ac.uk, apollo@ebi.ac.uk >>Subject: [Bioperl-l] An open letter to bioinformatcis researchers >>Date: Sat, 9 Dec 2000 19:03:10 +0000 (GMT) >> >> >> >>Dear fellow bioinformatics developers: >> >>By now you have probably heard that Celera Genomics has submitted >>their human genome paper to the journal Science. Science and Celera >>have agreed to special terms for the release of the human genome >>sequence data. It will be made available through the Celera website, >>and will not be submitted to the international DNA database consortium >>(GenBank, EMBL and DDBJ). Science's statement regarding the agreement >>is at: >>http://www.sciencemag.org/feature/data/announcement/genomesequenceplan.shl >> >>All major journals, including Science, have a policy of deposition of >>sequence data with the "appropriate data bank". The accepted community >>standard is submission to GenBank/EMBL/DDBJ. The reason for this >>deposition is to make the results of the work openly available for >>future research. This principle was specifically mentioned in the >>Clinton/Blair statement on human genome sequencing - >> http://www.usinfo.state.gov/topical/global/biotech/00031401.htm >>- - who strongly upheld the view that "unencumbered access" to genome >>data was critical. >> >>The terms of the Celera/Science agreement will give us access to the >>genome sequence, but not unencumbered access. Celera is suggesting >>publishing their data under a MTA (Material Transfer Agreement) which >>would prevent large scale downloads and incorporation of this data >>into GenBank/EMBL/DDBJ. In order to download the data, you and your >>institution will have to sign a contract guaranteeing that you will >>not "redistribute" the Celera data. >> >>Science believes that the deal is an adequate compromise because it >>provides us the right to download the data and publish our results. >>We believe Science is thinking in terms of single gene biology, not >>large scale bioinformatics. It is probably not hard for you to imagine >>scenarios in bioinformatics in which "publication" and >>"redistribution" are virtually the same thing; we cannot imagine >>Celera allowing us to incorporate data into Pfam, for example, >>nor into Ensembl. >> >>We are asking for your support in writing to Science to politely >>insist that genome sequence papers should be accompanied by >>unencumbered deposition to GenBank/EMBL/DDBJ. Please note that we have >>no issue with Celera either keeping this data unpublished for >>commercial reasons, nor with them combining their data with freely >>available data from the public genome projects. We would defend their >>right to do either. Our view is simply that the genome community has >>established a clear principle that published genome data must be >>deposited in the international databases, that bioinformatics is >>fueled by this principle, and that Science therefore threatens to set >>a precedent that undermines our research. >> >>We encourage you to express your views on this matter to Donald >>Kennedy (kennedyd@kennedyd.pobox.stanford.edu), the Editor-in-Chief of >>Science, and/or to Barbara Jasny (bjasny@aaas.org), the managing >>editor in charge of genomics papers at Science. >> >> >>Here is a Q/A about some points. >> >>* Why does this matter? >> >>A classic example of how our field began to have an impact on >>molecular biology was Russ Doolittle's discovery of a significant >>sequence similarity between a viral oncogene and a cellular growth >>factor receptor. Russ could not have found that result if he did not >>have an aggregate database of previously published sequences. We have >>come a long way from Russ and his son typing data into the NEWAT >>protein sequence database by hand. >> >>Throughout the 80's the international database community fought hard >>to insist that DNA sequence data be deposited into the public domain >>databases. Journals now generally require deposition as a condition of >>accepting a paper. The forming of these databases and the >>international agreements on data sharing between the European, >>American and Japanase databases fostered the rapid development of >>bioinformatics research. We now all take for granted the fact that >>large DNA databases are accessible from a single point of contact, and >>the identifiers are coordinated worldwide. >> >>Bioinformatics research relies on open data with minimal legal >>encumberances submitted to public databases. Without these databases >>there is no real substrate for bioinformatics research. >> >> >>* What would happen if this precedent was set? >> >>There are a number of consequences if Science set a precedent that >>allowed people to publish DNA data under a variety of MTAs. >> >>- - One would not be able to form a single DNA database on which to >> do bioinformatics research, and the derivative databases (Swissprot, >> PIR, Pfam, PROSITE, etc.) would not be legal. >> >>- - Bench biologists would have to visit a number of websites and >> possibly enter into a number of different contracts for access to DNA >> data. Unexpected informative homologies could become prohibitively >> difficult to find. >> >>- - You may need to get a legal review before you can publish >> the results of an analysis, if your analysis is large-scale and >> detailed enough that it could be reasonably interpreted as a >> "redistribution" of the primary sequence data. You could >> be sued for breach of contract for a Web Supplement page >> that discloses extensive sequence data supporting your results. >> >>- - Scientific openness will be undermined. Efforts to engage the >> community in cooperative annotation of large genomes, for instance, >> would be blocked -- we can't usefully annotate a genome we can't freely >> redistribute. >> >> >>* Celera paid for it. Can't they set their own access terms? >> >>Absolutely. We have no issue with Celera's commercial data gathering, >>and their right to set their own access terms to their data. We do >>feel, though, that scientific publications carry a certain ethical >>responsibility. The purpose of a paper is to enable the community to >>efficiently build on your work. There is always a tension between >>disclosing your work to your competitors (this is not unique to >>private companies!) and receiving scientific credit for your work via >>publication. This tension is natural, and maintaining a consistent >>and acceptable balance is the reason that scientist and journals >>establish community standards that dictate how data are required to be >>disclosed. In this case, the clearly accepted community standard is >>that DNA sequence data are deposited in Genbank/EMBL/DDBJ upon >>publication. >> >>We certainly do not blame Celera (much) for seeking a special deal >>that lets them have their cake and eat it too -- they would >>understandably like scientific credit for their terrific and important >>work in human sequencing, and they would also like a profitable >>business model. >> >>We do blame Science for failing to take a strong stand in upholding >>accepted scientific publication practices. We cannot accept that it is >>necessary to sacrifice ethics for expediency. >> >>* Science claims they are honouring their own policy. What gives? >> >>Science now claims that all their policy really requires is that >>archival data be available via a publicly accessible database. We >>think this is a conveniently revisionist view of their own policy, >>which states (in Instructions to Authors): >> >>"archival data sets (such as sequence and structural data) must be >>deposited with the appropriate data bank and the identifier code should be >>sent to Science for inclusion in the published manuscript (coordinates >>must be released at the time of publication)" >> >>Notice the use of the definitive article "THE appropiate data bank", >>the notion of "deposition", and the additional rider that the >>identifier code should be sent. >> >>The spirit of this statement seems clear to us. Science's statement >>anticipates that there is an appropriate, single, aggregrate community >>database for each sort of archival data, whether DNA sequence, protein >>structure coordinates, or something else. Sensibly, they don't name >>every possible database for every possible archival data set. They >>expect that recognized community standards exist. In no way does >>Science's statement seem consistent with the view that an individual >>lab could start its own "public" DNA sequence database and send a >>meaningless internal database identifier; to try to read it that way >>is a post hoc rationalisation. >> >> >>* What can Science do? This is a done deal. >> >>It's true that this is a done deal. Science and Celera have mutually >>agreed to the general terms of data release. But there are two ways >>that we can minimize the damage. >> >>First, the details of the agreement are not set. In particular, there >>is no definition of allowed "publication" versus prohibited >>"redistribution". Science could specify definitions that did not >>interfere with noncommercial uses of the data in bioinformatics, >>allowing us redistribution rights if it made sense in the context of >>our project (for example, a genome annotation project like Ensembl). >> >>Second, and preferably, Science -- or even the peer reviewers -- can >>uphold Science's own data access policy, and reject the paper. >> >>Incidentally, they might also choose to enforce Science's policy on >>prior publication, which states "...the main findings of a paper >>should not have been reported in the mass media. Authors are, however, >>permitted to present their data at open meetings but should not >>overtly seek media attention." If I issued a press release upon >>submission of a manuscript to Science, like Celera did, Science would >>rightly fire it back to me without review. >> >>* What can I do? >> >>Agitate. Let Science know that you care. They consider this deal to be >>a trial balloon for future genome papers. Even if we can't change the >>deal with Celera, we can try to make sure it's a one-time-only deal >>that's viewed as a Big Mistake. Write a letter to Science and tell >>them how their actions would impact your research, both in the long >>term and in the short term. Also, you can pass on this open letter to >>other bioinformatics researchers you know. >> >> >>Dr Sean Eddy, >>Alvin Goldfarb Professor of Computational Biology, >>Howard Hughes Medical Institute, Washington University in St. Louis, USA >> >>Dr Ewan Birney >>Team Leader, Genomic Annotation >>European Bioinformatics Institute, UK >> >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l@bioperl.org >>http://bioperl.org/mailman/listinfo/bioperl-l Unite de Genetique Moleculaire et Biologie du Developpement CNRS ERS 1984 - 7-19 rue Guy Moquet BP 8 - 94801 VILLEJUIF CEDEX - FRANCE Tel : 33 (0)1 49 58 34 98 - Fax : 33 (0)1 49 58 35 09 E-mail : auffray@infobiogen.fr