From dalke at dalkescientific.com Mon Apr 3 03:20:59 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 3 Apr 2006 01:20:59 -0600 Subject: [DAS2] daylight saving time Message-ID: <366941fb271add552809d50a50ab2027@dalkescientific.com> For the non-US people involved in the next phone conference call, the US just changed to daylight saving time so California is now 7 hours behind GMT instead of 8. I think the UK switched a week earlier than the US which is why people there couldn't make it last week? Andrew dalke at dalkescientific.com From lstein at cshl.edu Mon Apr 3 12:53:17 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 3 Apr 2006 12:53:17 -0400 Subject: [DAS2] daylight saving time In-Reply-To: <366941fb271add552809d50a50ab2027@dalkescientific.com> References: <366941fb271add552809d50a50ab2027@dalkescientific.com> Message-ID: <200604031253.17513.lstein@cshl.edu> Hi Guys, I'm stuck on another conf call right now. I'll be joining in 10 min. Lincoln On Monday 03 April 2006 03:20, Andrew Dalke wrote: > For the non-US people involved in the next phone conference call, > the US just changed to daylight saving time so California is now > 7 hours behind GMT instead of 8. I think the UK switched a week > earlier than the US which is why people there couldn't make it > last week? > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 -- Lincoln Stein lstein at cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From mgibson at bdgp.lbl.gov Mon Apr 3 12:29:55 2006 From: mgibson at bdgp.lbl.gov (mark gibson) Date: Mon, 3 Apr 2006 12:29:55 -0400 Subject: [DAS2] Mark Gibson on Apollo writeback to Chado In-Reply-To: References: Message-ID: Ive attached a powerpoint presentation that is probably easier to glance at than reading through this whole email. The first half of it is about apollo transactions. Mark -------------- next part -------------- A non-text attachment was scrubbed... Name: gmod-sri-13.ppt Type: application/vnd.ms-powerpoint Size: 599552 bytes Desc: not available URL: -------------- next part -------------- On Mar 27, 2006, at 2:42 PM, Nomi Harris wrote: > mark gibson said that he plans to attend next monday's DAS/2 > teleconference. he also gave me permission to forward this message > that > he wrote recently in response to a group that is adapting apollo and > wondered what he thought about direct-to-chado writeback vs. the > use of > chadoxml as an intermediate storage format. FlyBase Harvard > prefers to > use the latter approach because (we gather) they worry about possibly > corrupting the database by having clients write directly to it. if > anyone from harvard is reading this and feels that mark has > misrepresented their approach, please set us straight! > > Nomi > > On 10 March 2006, Mark Gibson wrote: >> Im rather biased as a I wrote the chado jdbc adapter [for Apollo], >> but let me put forth my >> view of chado jdbc vs chado xml. >> >> The chado Jdbc adapter is transactional, the chado xml adapter is >> not. What this >> means is jdbc only makes changes in the database that reflect what >> has actually >> been changed in the apollo session, like updating a row in a >> table; with chado >> xml you just get the whole dump. So if a synonym has been added >> jdbc will add a >> row to the synonym table. For xml you will get the whole dump of >> the region you >> were editing (probably a gene) no matter how small the edit. >> >> What I believe Harvard/Flybase then does (with chado xml) is wipe >> out the gene >> from the database and reinsert the gene from the chado xml. The >> problem with >> this approach is if you have data in the db thats not associated >> with apollo >> (for flybase this would be phenotype data) then that will get >> wiped out as well, >> and there has to be some way of reinstating non-apollo data. If >> you dont have >> non-apollo data and dont intend on having it in the future this >> isnt a huge >> issue I suppose. I think Harvard is integrating non-apollo data >> into their chado >> database. >> >> I think what they are going to do is actually figure out all of >> the transactions >> by comparing the chado xml with the chado database, which is what >> apollo already >> does, but I'm not sure as Im not so in touch with them these days >> (as Im not >> working with apollo these days - waiting for new grant to kick in). >> >> Since the paradigm with chado xml is wipe out & reload, then >> apollo has to make >> sure it preserves every bit of the chado xml that came in. Theres >> a bunch of >> stuff thats in chado/chado xml that the apollo datamodel is >> unconcerned with, >> and has no need to be concerned with as its stuff that it doesnt >> visualize. In >> other words apollos data model is solely for apollos task of >> visualizing data, >> not for roundtripping what we call non-apollo data. In writing the >> chado xml >> adapter for FlyBase, Nomi Harris had a heck of a time with these >> issues, and she >> can elaborate on this I suppose. >> >> I'm personally not fond of chado xml because its basically a >> relational database >> dump, so its extremely verbose. It redundantly has information for >> lots of joins >> to data in other tables - like a cvterm entry can take 10 or 20 >> lines of chado >> xml, and a given cvterm may be used a zillion times in a given >> chado xml file >> (as every feature has a cvterm). So these files can get rather large. >> >> The solution for this verbose output is to use what I call macros >> in chado xml. >> Macros are supported by xort. They take the 15 line cvterm entry >> and reduce it >> to a line or 2 making the file size much more reasonable. The >> apollo chado xml >> adapter does not support macros, so you have to use unmacro'd >> chado xml for >> apollo purposes. Nomi Harris had a hard enough time getting the >> chado xml >> adapter working for flybase(and did a great job with a harrowing >> task), that she >> did not have time to take on the macro issue. If you wanted macros >> (and smaller >> file sizes) you would have to add this functionality to the chado >> xml adapter >> (are there java programmers in your group?). >> >> One of the arguments against the jdbc adapter is that its >> dangerous because it >> goes straight into the database so if there are any bugs in the >> data adapter >> then the database could get corrupted - some groups find this a >> bit precarious. >> This is a valid argument. I think theres 2 solutions here. One is >> to thoroughly >> test the adapter out against a test database until you are >> confident that bugs >> are hammered out. >> >> Another solution is to not go straight from apollo to the >> database. You can use >> an interim format and actually use apollo to get that interim >> format into the >> database. Of course one choice for interim format is chado xml and >> then you are >> at the the chado xml solution. The other choice for file format is >> GAME xml. You >> can then use apollo to load game into the chado database, and this >> can be done >> at the command line (with batching) so you dont have to bring up >> the gui to do >> it. Also chado xml can be loaded into chado via apollo as well (of >> course xort >> does this as well but not with transactions) >> >> So then the question is if Im not going to go straight into the >> database, why >> would I choose game over chado xml? Or if Im using chado xml >> should I use >> apollo or xort to load into chado. I think if you are using chado >> xml it makes >> sense to use xort as it is the tried & true technology for chado >> xml. The >> advantage of going through apollo is that it also uses the >> transactions from >> apollo (theres a transaction xml file) and thus writes back the >> edits in a >> transactional way as mentioned above rather than in a wipe out & >> reload fashion. >> >> Also Game is a tried & true technology that has been used with >> apollo in >> production at flybase (before chado came along) for many years >> now. One >> criticism of it has been that DTD/XSD/schema has been a moving >> target, nor has >> it been described. That is not as true anymore. Nomi Harris has >> made a xsd for >> it as well as a rng. But I must confess that I have recently added >> the ability >> to have one level annotations in game (previously 1 levels had to >> be hacked as 3 >> levels). Also game is a lot less verbose than un-macro'd chado >> xml, as it more >> or less fits with the apollo datamodel. One advantage of chado xml >> over game xml >> is that it is more flexible in terms of taking on features of >> arbitrary depth. >> >> The chado xml adapter was developed for FlyBase and as far as I >> know has not >> been taken on by any other groups yet. Nomi can elaborate on this, >> but I think >> what this might mean is that there are places where things are >> FlyBase specific. >> If you went with chado xml the adapter would have to be >> generalized. Its a good >> exercise for the adapter to go through, but it will take a bit of >> work. Nomi can >> probably comment on how hard generalizing might be. I could be >> wrong about this >> but I think the current status with the chado xml adapter is that >> Harvard has >> done a bunch of testing on it but they havent put it into >> production yet. >> >> The jdbc adapter is being used by several groups so has been >> forced to be >> generalized. One thing I have found is that chado databases vary >> all too much >> from mod to mod (ontologies change). There is a configuration file >> for the jdbc >> adapter that has settings for the differences that I encountered. >> I initially >> wrote it for cold spring harbors rice database that will be used >> in classrooms. >> Its working for rice in theory, but they havent actually used it >> much in the >> classroom yet. For rice the model is to save to game and use >> apollo command line >> to save game & transactions back to chado. >> >> Cyril Pommier, at the INRA - URGI - Bioinformatique, has taken on >> the jdbc >> adapter for his group. I have cc'd him on this email as I think he >> will have a >> lot to say about the jdbc adapter. Cyril has uncovered many bugs >> and has fixed a >> lot of them (thank you cyril) as hes a very savvy java programmer. >> And he has >> also forced the adapter to generalize and brought about the >> evolution of the >> config file to adapt to chado differences. But as Cyril can attest >> (Cyril feel >> free to elaborate) it has been a lot of work to get jdbc working >> for him. There >> were a lot of bugs to fix that we both went after. Hopefully now >> its a bit more >> stable and the next db/mod wont have as many problems. I think >> Cyril is still at >> the test phase and hasn't gone into production (Cyril?) >> >> Berkeley is using the jdbc adapter for an in house project. They >> are using the >> jdbc reader to load up game files (as the straight jdbc reader is >> slow as the >> chado db is rather slow) which are then loaded by a curator. They >> are saving >> game, and then I think chris mungall is xslting game to chado xml >> which is then >> saved with xort - or he is somehow writing game in another way - >> not actually >> sure. The Berkeley group drove the need for 1 level annotations(in >> jdbc,game,& >> apollo datmodel) >> >> Jonathan Crabtree at TIGR wrote the jdbc read adapter, and they >> use it there. I >> believe they are intending to use the write adapter but dont yet >> do so (Jonathan?). >> >> I should mention that reading jdbc straight from chado tends to be >> slow, as I >> find that chado is a slow database, at least for Berkeley. It >> really depends on >> the db vendor and the amount of data. TIGRs reading is actually >> really zippy. >> The workaround for slow chados is to dump game files that read in >> pretty fast. >> >> In all fairness, you should probably email with FlyBase (& Chris >> Mungall) and >> get the pros of using chado xml & xort, which they can give a far >> better answer >> on than I. >> >> Hope this helps, >> Mark > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 From lstein at cshl.edu Thu Apr 6 16:08:30 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 6 Apr 2006 16:08:30 -0400 Subject: [DAS2] Global IDs for worm Message-ID: <200604061608.32914.lstein@cshl.edu> I've created a directory in the das CVS under das2/GlobalSeqIDs/ to hold text files describing sequence IDs for common organisms. Currently I've created one for Worm. My schedule for the others is: Drosophilids Yeast Human Mouse Drosophila is the difficult one because there are many partial sequences. I may just do melanogaster for now. Lincoln -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From dalke at dalkescientific.com Mon Apr 10 00:24:24 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sun, 9 Apr 2006 22:24:24 -0600 Subject: [DAS2] was ill Message-ID: <0436e7cb5802c65cbce1a757a2a31b2f@dalkescientific.com> Hi all, The reason you haven't heard from me in the last week is I was quite ill with an upper respiratory virus, which you heard a bit of in last week's phone conference. I was barely able to read a paragraph at a time, much less write anything coherent. It broke yesterday afternoon and I'm able to work now. Strangest part was on Friday night when I dreamed about parsing RSS feeds and every time I tried to get element [0] I would wake up coughing. That's some virus! Andrew dalke at dalkescientific.com From Gregg_Helt at affymetrix.com Mon Apr 10 13:19:23 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 10 Apr 2006 10:19:23 -0700 Subject: [DAS2] Problem with DAS/2 registry? Message-ID: I've been trying to reach the DAS/2 registry at: http://www.spice-3d.org/dasregistry/das2/sources which used to work, but now I'm getting this error message: Proxy Error The proxy server received an invalid response from an upstream server. The proxy server could not handle the request GET?/dasregistry/das2/sources. Reason: Could not connect to remote machine: Connection refused Apache/1.3.33 Server at www.spice-3d.org Port 80 Any idea what the problem is? Thanks, Gregg From dalke at dalkescientific.com Fri Apr 14 04:29:46 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 14 Apr 2006 02:29:46 -0600 Subject: [DAS2] alignments Message-ID: <5dd5ce9d6d6e977e56c7b4e30e622f7c@dalkescientific.com> I need a bit of help here. I'm trying to hand-write an example of a feature based on an alignment. Let's assume these are annotations on fly and it's aligned to human. There's a hit from fly chromosome 4 http://www.flybase.org/genome/D_melanogaster/R4.3/dna/4 range 100:200 to human chromosome 8 http://www.ensembl.org/Homo_sapiens/Chr1 range 200:300 Assume the CIGAR string of the match is 51 identical, 3 insertions, 24 identical, 3 deletions, 25 identical Here's the best I can manage: First question: Where do I put the object to which the alignment aligns? Will it be a segment or a feature? Now, I could have this completely wrong and DAS2 is not meant for genome/genome alignments like this. If that's the case please offer an example of how to write an alignment. Second question: What's the format of the CIGAR string? Lincoln's text pointed to http://www.ebi.ac.uk/~guy/exonerate/exonerate.man.1.html That documentation says: > The format starts with the same 9 fields as sugar output (see above), > and is followed by a series of pairs where > operation is one of match, insert or delete, and the length describes > the number of times this operation is repeated. However, it does not list the operation characters nor if there are spaces between the fields. I assume it is "M 51 I 3 M 24 D 3 25 I", though perhaps without spaces. The GFF3 documentation at http://song.sourceforge.net/gff3.shtml refers to http://cvsweb.sanger.ac.uk/cgi-bin/cvsweb.cgi/exonerate?cvsroot=Ensembl but I can find no relevant documentation there. I then found a comment by Richard Durbin from two years ago, at http://portal.open-bio.org/pipermail/bioperl-l/2003-February/ 011234.html > 3) I'm not convinced by the format for the Align string. This requires > a character per aligned base. There are a variety of run-length type > encodings in common use that are much more compact. e.g. Ensembl uses > a > string such as "60M1D8M3I15M" to mean "60 match, then 1 delete, then 8 > match, then 3 insert, then 15 match". They call this CIGAR, but when I > talked to Guy Slater, who invented CIGAR for exonerate, his version is > subtly different: "M 60 D 1 M 8 I 3 M 15" for the same string (see > http://www.ensembl.org/Docs/wiki/html/EnsemblDocs/CigarFormat.html). > Jim Kent also has something like this. I'd prefer us to standardise on > one of these formats, all of which are very short for ungapped matches. Which is the CIGAR string format DAS2 supports? Where is the documentation for it? Andrew dalke at dalkescientific.com From aloraine at gmail.com Fri Apr 14 20:05:17 2006 From: aloraine at gmail.com (Ann Loraine) Date: Fri, 14 Apr 2006 19:05:17 -0500 Subject: [DAS2] question regarding most up-to-date D. melanogaster DAS? Message-ID: <83722dde0604141705t369cd016u30f1ca2ea7622d6c@mail.gmail.com> Hi, I'm helping a colleague with an eQTL study and need to do a region-based query on the most up-to-date fruit fly annotations. Our markers (for influential loci in the study) are mapped to cytological bands. Is it possible to run region-based queries using cytological coordinates? (e.g., 30A - 30B, inclusive) My goal is to find all candidate genes under those peaks. I also have (approximate) mappings of cytological bands onto the physical (genomic coordinates) map of Drosophila, so, if necessary, I could use those to collect the genes mapping to those locations. Which fruit fly DAS server would provide the most up-to-date information? If you have other recommendations for how to proceed, I would be grateful for your help! All the best, Ann -- Ann Loraine Assistant Professor Section on Statistical Genetics University of Alabama at Birmingham http://www.ssg.uab.edu http://www.transvar.org From dalke at dalkescientific.com Mon Apr 17 02:54:30 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 17 Apr 2006 00:54:30 -0600 Subject: [DAS2] updated spec Message-ID: Spec writing is like working on a dissertation. Here's an example, in the form of a text adventure http://acephalous.typepad.com/acephalous/2006/04/disadventure.html > look laptop There seems to be a dissertation chapter on the laptop. > read chapter It is long-winded and boring. You do not want to read it. > read chapter It is obnoxious. You hate it. > read book Read. There is a book underneath it that concerns a related topic. > read book Read. There is a book underneath it that concerns a related topic. > work on dissertation You spend two hours searching the OED for the usage history of the word devolve. > work on dissertation You spend three hours reading five articles which have nothing to do with the dissertation. > work on dissertation You spend twenty minutes online reading about baseball. ... > work on dissertation You spend five minutes playing online poker. > work on dissertation You pick your nose. > work on dissertation You go to the kitchen and eat cheese. > work on dissertation The Mets are on. It should be a good game. Anyway, I've gone through the das/das2/draft3/spec.txt document and updated everything (well, not writeback. I'm going to need more cheese.) Next is to get feedback, validate my inline examples, and convert the behemoth into HTML, to replace what's on the web site. Finally. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Mon Apr 17 03:31:13 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 17 Apr 2006 01:31:13 -0600 Subject: [DAS2] outstanding questions Message-ID: These are culled from the current draft of the spec. I used "XXX" to denote regions where I had questions. 1) type ontology URI The TYPE elements have an 'ontology' attribute. This is supposed to be a required element, which is the URI of the corresponding ontology term. At present there is no URI system for ontology. We added a special 'accession' attribute which is the GO id, as in so_accession="SO:0000704" This was meant to be a hack for the hackathon. My thought is: - keep the GO accession (as an optional attribute) - make 'ontology' be an optional attribute, but one of 'ontology' or 'so_accession' is required Also, should that be "SO:0000704" or simply "0000704" ? I think the "SO:" should be present. 2) Feature strand. I want to make sure this is correct 1 for positive -1 for negative 0 for unknown not given for both strands or does not have meaning 3) taxid The 'taxid' in the SOURCE element does not appear to be useful. It's written Notice how the taxid exists in the SOURCE element and the COORDINATES element (and how there are difference taxids for each COORDINATES)? I think we can drop 'taxid' from the SOURCE element and if it's important someone should have a COORDINATES element. 4) 'writeable' The versioned source element contains the attribute "writeable", as in Do we need that 'writeable' attribute? It seems that if there's a writeback capability then then versioned source is writeable. 5) content-type for FASTA records "text/plain", "text/x-fasta" or "chem/x-fasta" Looking around now I also see "application/x-fasta" and "application/fasta". I'm going to say "should be text/x-fasta but may be text/plain". Objections? 6) response document too large I've described that a server may return an error if the response document is too large. This means a client may try again, hopefully making a request which returns a smaller document. My question is, how does a client make a smaller request? What if the server decides that sending more than 5 features at a time is too much? When does the client just give up and say the server implementation is crazy? 7) styles Are we going to go with the current style system or some other approach? The DAS1 styles had support for limited semantic zooming, with options for "high", "medium" and "low" resolution. What do those mean? When should a client choose one over another? What does "height" mean for a glyph? How do the glyph and text interoperate? Eg, is the "height" the height for both, or just for the glyph? Should style information be moved outside of the DAS2 exchange spec? 8) the "count" format We talked about, and people wanted, a "count" format. This returns the number of features which would be returned in a query. Does it really return the number of features, or does it return the number of complex annotations (eg, if there is a complex annotation with a root and two children, is that a count of "1" or a count of "3"? Given the way we've done things, I'm going with "3".) 9) alignments How do I write an alignment? Please give an example - I can't figure it out. 10) CIGAR string What's the format of the CIGAR string? I've found two main variations. They are M 40 I 1 M 12 D 4 40M1I12M4D The latter appears to be the most common. However, I did see one case where if no count is given "1" is implied, so the latter can also be written 40MI12M4D 10) Do we need a REGION element? I've written All feature locations are given in coordinates on a segment. Some features may be locatable on other features. For example, a contig feature may be locatable on a supercontig. This relationship is stored using a REGION element. A FEATURE element has zero or more REGION elements. The 'feature' attribute of the REGION element contains the URI of the parent feature, on which the current feature is located. A REGION record has an optional 'range' attribute. If not given the feature is on the entire parent feature. The range string is the same syntax and meaning as in the LOC record. XXX I think this is overkill - what are some good examples of use; perhaps when the global coordinates are not well-defined?. Are negative coordiantes important, like "promoter region is 20 bases upstream from some gene"? Does this need a CIGAR string too? XXX For example, suppose feature A is 6 bases long and is on chromosome 5 at position 10000, on exon X at position 300 and on contig K at position 7. The FEATURE record for this feature may be as follows: 11) XID Currently the XID element has a single attribute, 'href'. I wrote A FEATURE has zero or more XID elements linking the feature record to an external database entry. XXX This is not well-thought out. I think it should have: 'uri' -- a URL or LSID 'authority' -- the name of the database (controlled vocabulary) 'type' -- 'primary', 'accession', or possibly others? 'id' -- the actual identifier 'description' -- a paragraph or so describing the link, for humans to see why they might want to look into a link This has to be a well-defined concept. Let's steal from someone else. The use-case here is to link to sequence records in other databases and to link to PubMed or other bibliographic databases. 12) complex features In the spec I wrote Some features are complex and cannot easily be modeled with a single feature record. Quoting from the "Chado Schema Documentation" XXX give hyperlink XXX The class of transplicing events that involve ligating transcripts from different loci into a mature mRNA requires a separate feature to represent each locus transcript and one to represent the fused transcript. The fragments are located on the fused transcript; portions of the fused transcript can also be located on the genome. Is this a relevant example of a complex feature for DAS2? If not, give another example. In general I'm having a hard time coming up with good examples of various forms of complex features. I just don't know the domain well enough. 13) "root" attribute I proposed that features have a new, optional attribute called "root". If a feature is part of a complex annotation then the "root" attribute must be present and it must have the URI of the root feature for the annotation. This makes client processing easier, though it is not needed in the purest of senses. 14) features have a 'STYLE' element The idea was that an individual feature could override the style given in the feature type record. I don't think that's useful and/or we need a real stylesheet instead. I'm going to drop the STYLE element from the FEATURE element unless there is objection. 15) In text searches we've defined ABC -- field exactly matches "ABC" *ABC -- field ends with "ABC" ABC* -- field starts with "ABC" *ABC* -- field contains the substring "ABC" I want to say that using "*" and "?" elsewhere in the query string is implementation dependent. That is, "A*B" might match everything with an A followed by a B or it might match the exact string "A*B" and only that string. I did this because looking around at various tools it looks like it might be hard to change the meaning of "*" and "?" for the text searches. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Mon Apr 17 03:40:07 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 17 Apr 2006 01:40:07 -0600 Subject: [DAS2] proposed April 17 agenda Message-ID: <4fb9a13f4a18a6e1275256affbb97a51@dalkescientific.com> Gregg is taking the month off. I volunteered to be in charge of the next teleconference. Here is what I would like to talk about: 1. get additional agenda items 2. status reports 3. who maintains the list of reference names for different genomes (starting with the list Licoln developed)? 4. resolve some questions with the spec (see my previous email) 5. get a volunteer to come up with best-practices examples of how to represent various complex annotations 6. writeback planning Andrew dalke at dalkescientific.com From lstein at cshl.edu Mon Apr 17 09:46:23 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 17 Apr 2006 09:46:23 -0400 Subject: [DAS2] alignments In-Reply-To: <5dd5ce9d6d6e977e56c7b4e30e622f7c@dalkescientific.com> References: <5dd5ce9d6d6e977e56c7b4e30e622f7c@dalkescientific.com> Message-ID: <200604170946.24479.lstein@cshl.edu> I didn't realize there were multiple things called CIGAR. I think we should use Ensembl CIGAR format. The target of the alignment should be a segment, and not another feature. Best, Lincoln On Friday 14 April 2006 04:29, Andrew Dalke wrote: > I need a bit of help here. I'm trying to hand-write an example of a > feature based on an alignment. Let's assume these are annotations on > fly and it's aligned to human. There's a hit from > > fly chromosome 4 > http://www.flybase.org/genome/D_melanogaster/R4.3/dna/4 > range 100:200 > > to human chromosome 8 > http://www.ensembl.org/Homo_sapiens/Chr1 > range 200:300 > > Assume the CIGAR string of the match is > 51 identical, 3 insertions, 24 identical, 3 deletions, 25 identical > > Here's the best I can manage: > > > > segment="http://www.flybase.org/genome/D_melanogaster/R4.3/dna/4" > range="100:200" cigar="?????"/> > > > > > First question: > Where do I put the object to which the alignment aligns? Will > it be a segment or a feature? Now, I could have this completely wrong > and DAS2 is not meant for genome/genome alignments like this. If > that's the case please offer an example of how to write an alignment. > > > Second question: > What's the format of the CIGAR string? Lincoln's text pointed to > http://www.ebi.ac.uk/~guy/exonerate/exonerate.man.1.html > > That documentation says: > > The format starts with the same 9 fields as sugar output (see above), > > and is followed by a series of pairs where > > operation is one of match, insert or delete, and the length describes > > the number of times this operation is repeated. > > However, it does not list the operation characters nor if there are > spaces > between the fields. I assume it is "M 51 I 3 M 24 D 3 25 I", though > perhaps > without spaces. > > The GFF3 documentation at http://song.sourceforge.net/gff3.shtml refers > to > http://cvsweb.sanger.ac.uk/cgi-bin/cvsweb.cgi/exonerate?cvsroot=Ensembl > but I can find no relevant documentation there. > > I then found a comment by Richard Durbin from two years ago, at > > http://portal.open-bio.org/pipermail/bioperl-l/2003-February/ > 011234.html > > > 3) I'm not convinced by the format for the Align string. This requires > > a character per aligned base. There are a variety of run-length type > > encodings in common use that are much more compact. e.g. Ensembl uses > > a > > string such as "60M1D8M3I15M" to mean "60 match, then 1 delete, then 8 > > match, then 3 insert, then 15 match". They call this CIGAR, but when I > > talked to Guy Slater, who invented CIGAR for exonerate, his version is > > subtly different: "M 60 D 1 M 8 I 3 M 15" for the same string (see > > http://www.ensembl.org/Docs/wiki/html/EnsemblDocs/CigarFormat.html). > > Jim Kent also has something like this. I'd prefer us to standardise on > > one of these formats, all of which are very short for ungapped matches. > > Which is the CIGAR string format DAS2 supports? Where is the > documentation for it? > > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From dalke at dalkescientific.com Mon Apr 17 12:19:47 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 17 Apr 2006 10:19:47 -0600 Subject: [DAS2] question regarding most up-to-date D. melanogaster DAS? In-Reply-To: <83722dde0604141705t369cd016u30f1ca2ea7622d6c@mail.gmail.com> References: <83722dde0604141705t369cd016u30f1ca2ea7622d6c@mail.gmail.com> Message-ID: <3ecdfacc003d58cc93045bc7a4aefb57@dalkescientific.com> Ann: > Our markers (for influential loci in the study) are mapped to > cytological bands. Is it possible to run region-based queries using > cytological coordinates? (e.g., 30A - 30B, inclusive) My goal is to > find all candidate genes under those peaks. At present there is no way to do that. A server can extend the query syntax to support searches in cytological coordinates and add new feature elements to store those coordinates. I don't know enough about how people use those coordinates to sketch an example. Andrew dalke at dalkescientific.com From aloraine at gmail.com Mon Apr 17 13:47:03 2006 From: aloraine at gmail.com (Ann Loraine) Date: Mon, 17 Apr 2006 12:47:03 -0500 Subject: [DAS2] question regarding most up-to-date D. melanogaster DAS? In-Reply-To: <3ecdfacc003d58cc93045bc7a4aefb57@dalkescientific.com> References: <83722dde0604141705t369cd016u30f1ca2ea7622d6c@mail.gmail.com> <3ecdfacc003d58cc93045bc7a4aefb57@dalkescientific.com> Message-ID: <83722dde0604171047r26a32986gaa4c3b34b6166c16@mail.gmail.com> I'm not sure it would be worth adding more work to the project to allow for these cases. If funding is renewed, then I think it would be worth the effort. But for now, probably not, since it would be a new feature. (At this stage, avoiding feature creep seems advisable :-) I believe I can get a mapping of cytological bands onto genomic coordinates from FlyBase. I don't know how reliable these mappings are, but assuming they are okay, I can use them to query a fly DAS site to get the genes in those coordinates. I'm not sure what is the best DAS site to use for this, however. -Ann On 4/17/06, Andrew Dalke wrote: > Ann: > > Our markers (for influential loci in the study) are mapped to > > cytological bands. Is it possible to run region-based queries using > > cytological coordinates? (e.g., 30A - 30B, inclusive) My goal is to > > find all candidate genes under those peaks. > > At present there is no way to do that. > > A server can extend the query syntax to support searches in > cytological coordinates and add new feature elements to store > those coordinates. I don't know enough about how people use > those coordinates to sketch an example. > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 > -- Ann Loraine Assistant Professor Section on Statistical Genetics University of Alabama at Birmingham http://www.ssg.uab.edu http://www.transvar.org From dalke at dalkescientific.com Tue Apr 18 03:36:39 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 18 Apr 2006 01:36:39 -0600 Subject: [DAS2] proposed April 17 agenda In-Reply-To: <4fb9a13f4a18a6e1275256affbb97a51@dalkescientific.com> References: <4fb9a13f4a18a6e1275256affbb97a51@dalkescientific.com> Message-ID: Summary of today's conference call. > 2. status reports The biggest one is that the new version of IGB is out and the Affy DAS server is available at http://netaffxdas.affymetrix.com/das2/sequence Steve and Ed (as I recall) tracked down a problem with that server which might affect other implementations. The problem is knowing the public/external URL for the DAS service. In theory it can be determined by looking at various CGI headers, but with things like an Apache rewrite and forwards to the actual server it can get complicated. The solution seems to be either use relative links or have a configuration option in the server specifying the base name. Lincoln's been working on reference names. Allen's been working on how the writeback server might work. I've been working on the spec, and have not gone further with the validator. > 3. who maintains the list of reference names for different > genomes (starting with the list Licoln developed)? Lincoln proposed, to broad acceptance, that we set up a wiki page with the reference names. The easiest way is to use the OBF wiki, at http://open-bio.org/wiki/Main_Page because that is already set up. I can ask the OBF about the appropriateness of that - I think it's fine. > 4. resolve some questions with the spec (see my previous email) Here are the resolutions: 1) type ontology URI I've emailed Suzi asking about plans for GO, the Gene Ontology Consortium, whoever in coming up with standardized, public ontology URLs. Allen's cc'ed on it, and we'll discuss this off the DAS list. 2) Feature strand. I stand corrected. The definitions are 1 for positive -1 for negative 0 both strands not don't know or does not have meaning 3) taxid There seems to be no reason to keep the 'taxid' in the SOURCE element. We'll only have it in the COORDINATES element. 4) 'writeable' We'll defer this (leaving it as-is) until we have the writeback defined a bit better. 5) content-type for FASTA records We'll recommend "text/x-fasta" or "text/plain" as the content-type for FASTA responses. There is no widely accepted community standard. 6) response document too large There is no automatic way for a client to narrow its request. This must be done by a person, depending on what the search criteria are. Servers should support large requests so that this isn't a problem. 7) styles We'll shift to using a stylesheet. This will be listed in the versioned source record as As a rough sketch the document will look like The STYLE elements add a new "uri" attribute which is the URI of the feature type being styled. In theory this could also include the feature uri (to define the style for a single feature) or an ontology uri (sets the style for all features with that ontology term or its descendants). However, with that comes problems of precedence. If the feature type and the feature and the ontology each have styles, which one wins? I think feature beats type beats ontology. But I also think we can ignore this because no one has asked for this sort of flexibility. (More flexibility would be support for a query language selecting which features, types, sources, ontologies, feature alias, etc. should get a given style. Not going there. :) 8) the "count" format This should be the number of feature elements returned, and not the number of "annotations" (counting the multiple features of a complex annotation as 1) 9) alignments Lincoln will provide examples. 10) CIGAR string We'll use the EBI style CIGAR strings, and the documentation will be based on the GFF3 description at http://song.sourceforge.net/gff3.shtml 10.5) Do we need a REGION element? No. Deleted from the spec. 11) XID On Ed's recommendation I'm looking at MAGE XML. I am not a good UML reader so it's slow going. My view so far is that what I sketched out is on the right track and we can simplify things compared to MAGE, eg, we don't need full bibliographic records. The other idea is to defer finalizing this until people start providing data with XIDs, so we know what's needed. 12) complex features Lincoln will come up with some examples. 13) "root" attribute There are two changes here: - complex annotations must have a single root feature - all features which are in complex annotations must have a link to the root element There's some worry about the first requirement, in that some complex annotations may not have a "real" root. I argue that having a synthetic one is okay. There were no strong arguments against having a single root. We decided to defer finalizing this until we have some example of complex annotations. 14) features have a 'STYLE' element no, they don't. 15) "*" and "?" in the query string The proposal here is to say that the interpretation of "*" other than at the start and/or end of the query string is implementation defined, as is the use of "?". It used to be that any other use of "*" must be treated as an asterisks, so "***" finds all strings containing a "*". It looks like people are fine with this looseness. > 5. get a volunteer to come up with best-practices examples > of how to represent various complex annotations That's Lincoln. > 6. writeback planning Allen will take the implementation lead on this, funding willing. He's currently working on how to associate an identifier with a new feature. One thought is to progress in stages: - upload completely new features / complex annotations to the server - modify an existing feature, though not the parent/part relationship (eg, change the location) - delete a simple feature - delete a complex annotation - modify an existing complex annotation, or turn a simple feature into a complex annotation - do 'em all at once The work will need to be server driven as the current clients can't handle this before the end of the funding period. The clients will mostly be library code. Andrew dalke at dalkescientific.com From lstein at cshl.edu Mon Apr 24 08:35:21 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 24 Apr 2006 08:35:21 -0400 Subject: [DAS2] Not able to make it today In-Reply-To: References: <4fb9a13f4a18a6e1275256affbb97a51@dalkescientific.com> Message-ID: <200604240835.21690.lstein@cshl.edu> Hi All, Due to wedding preparations I will be unable to attend the conference call today. I might or might not be able to make it next week (I'll be in Toronto) but I'll let you know in advance. Best, Lincoln -- Lincoln Stein lstein at cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From dalke at dalkescientific.com Mon Apr 24 12:11:31 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 24 Apr 2006 10:11:31 -0600 Subject: [DAS2] April 24 meeting - cancel? In-Reply-To: <200604240835.21690.lstein@cshl.edu> References: <4fb9a13f4a18a6e1275256affbb97a51@dalkescientific.com> <200604240835.21690.lstein@cshl.edu> Message-ID: Hi all, I'm trying to come up with an agenda but I've done very little the last week DAS related. I've been working on selling my house. Looks like this will be a short meeting, or should we just cancel? Here's my status. - Sent mail to Suzi asking about URIs for ontologies. Heard nothing from her yet. - Talked with the OBF people about setting up a wiki for the reference names for the genomes/segments. We decided to use the OBF wiki for now and if there are enough pages we'll migrate over to a biodas-specific wiki. I'm about 1/2-way through, learning wiki syntax. I'll email when it's there. - I've migrated the spec 300 doc into CVS. Just checked it in. There's still some formatting issues though. - started working on the stylesheet spec. Should take another 3 hours or so. - haven't been able to log into cgi.biodas.org to restart the validation server. - still need to write an rnc for the writeback for Allen Andrew dalke at dalkescientific.com From allenday at ucla.edu Mon Apr 24 12:29:09 2006 From: allenday at ucla.edu (Allen Day) Date: Mon, 24 Apr 2006 09:29:09 -0700 Subject: [DAS2] April 24 meeting - cancel? In-Reply-To: References: <4fb9a13f4a18a6e1275256affbb97a51@dalkescientific.com> <200604240835.21690.lstein@cshl.edu> Message-ID: <5c24dcc30604240929l7a882dd9qa15c0a51bd636cb0@mail.gmail.com> Let's cancel it. I have a database set up for writeback, and am able to POST delta XML to the server. I am still at the stage where I am parsing the XML. The DTD would be helpful. See attached figure "writeback.png" for the current implementation track. I am at the "Parse XML" step in implementation. See attached "vsourcecommand.png" for an overview of the previous writeback plans as documented in the HTML docs, and "vsourcelock.png" for an overview of lock plans as documented in the HTML docs. Parts of these may at some point be helpful for folding into the current implementation. I can send or commit to CVS the source documents for any of these diagrams if people would like to edit. -Allen On 4/24/06, Andrew Dalke wrote: > > Hi all, > > I'm trying to come up with an agenda but I've done very little > the last week DAS related. I've been working on selling my house. > Looks like this will be a short meeting, or should we just cancel? > > Here's my status. > > - Sent mail to Suzi asking about URIs for ontologies. Heard > nothing from her yet. > > - Talked with the OBF people about setting up a wiki for the > reference names for the genomes/segments. We decided to use the > OBF wiki for now and if there are enough pages we'll migrate over > to a biodas-specific wiki. I'm about 1/2-way through, learning > wiki syntax. I'll email when it's there. > > - I've migrated the spec 300 doc into CVS. Just checked it > in. There's still some formatting issues though. > > - started working on the stylesheet spec. Should take another > 3 hours or so. > > - haven't been able to log into cgi.biodas.org to restart the > validation server. > > - still need to write an rnc for the writeback for Allen > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 > -------------- next part -------------- A non-text attachment was scrubbed... Name: writeback.png Type: image/png Size: 41093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: vsourcelock.png Type: image/png Size: 91466 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: vsourcecommand.png Type: image/png Size: 49552 bytes Desc: not available URL: From dalke at dalkescientific.com Mon Apr 24 13:39:29 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 24 Apr 2006 11:39:29 -0600 Subject: [DAS2] sequence names on wiki Message-ID: <6e4986bba9736f1c43f239646b8a22d4@dalkescientific.com> I've imported Lincoln's list of global sequence identifiers onto the open-bio wiki at http://open-bio.org/wiki/DAS:GlobalSeqIDs Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Thu Apr 27 03:33:29 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu, 27 Apr 2006 01:33:29 -0600 Subject: [DAS2] writeback spec Message-ID: I've written up a draft of the writeback spec. It's in CVS. das/das2/das2_writeback.html with the RNC in das/das2/writeback.rnc -- for the writeback document das/das2/mapping.rnc -- for the mapping from old URLs to new On the question of how to handle new records, which need new identifiers, I decided to go with the private identifier scheme. The client uses "das-private:0000" where the "0000" is alphanumeric and 1 up to 20 characters long. The server responds with a mapping document which looks like I decided on this instead of the "preallocate identifier" scheme because this requires less state on the server (it doesn't need to remember which identifiers were already issued) and because it supports versioning servers better. Is the web site being updated from CVS? I see it hasn't gotten the updates I made on Monday. Andrew dalke at dalkescientific.com From Steve_Chervitz at affymetrix.com Thu Apr 27 13:34:12 2006 From: Steve_Chervitz at affymetrix.com (Chervitz, Steve) Date: Thu, 27 Apr 2006 10:34:12 -0700 Subject: [DAS2] writeback spec In-Reply-To: Message-ID: Andrew, > From: Andrew Dalke > Date: Thu, 27 Apr 2006 01:33:29 -0600 > To: DAS/2 > Subject: [DAS2] writeback spec > > I've written up a draft of the writeback spec. It's in CVS. Great. Thanks. > > Is the web site being updated from CVS? I see it hasn't gotten > the updates I made on Monday. You mean in some automated fashion? Before we switched to generating the html from templates, I set up a cron that updated the manually edited html file for the read spec on biodas.org. I don't know if there is an automated process that produces the template-based html from CVS on biodas.org -- unless you or Lincoln set something up. BTW, I can't ssh into portal.open-bio.org, or even ping it. This is (or perhaps was) the machine hosting biodas.org. Do you the story here? Steve From dalke at dalkescientific.com Thu Apr 27 13:55:55 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu, 27 Apr 2006 11:55:55 -0600 Subject: [DAS2] writeback spec In-Reply-To: References: Message-ID: Steve: > You mean in some automated fashion? Before we switched to generating > the > html from templates, I set up a cron that updated the manually edited > html > file for the read spec on biodas.org. I don't know if there is an > automated > process that produces the template-based html from CVS on biodas.org -- > unless you or Lincoln set something up. I didn't set anything up. One thing to note though is that I'm not using the template system for the current specs. The validator I have now is much more powerful than the one then so I'm parsing the spec documents and validating them. "More powerful" includes that I can report the error line as it is in the spec document and not just in the piece of XML to validate. It should be possible to just pull the specs out of CVS. > BTW, I can't ssh into portal.open-bio.org, or even ping it. This is (or > perhaps was) the machine hosting biodas.org. Do you the story here? Chris Dag. sent out an email on 3/23 "Important news for all developers ith open-bio.org CVS access (2) All of our websites have been consolidated on the new server newportal.open-bio.org Andrew dalke at dalkescientific.com From Steve_Chervitz at affymetrix.com Thu Apr 27 14:09:09 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Thu, 27 Apr 2006 11:09:09 -0700 Subject: [DAS2] writeback spec In-Reply-To: Message-ID: Andrew: > I didn't set anything up. One thing to note though is that I'm not > using > the template system for the current specs. The validator I have now is > much more powerful than the one then so I'm parsing the spec documents > and validating them. "More powerful" includes that I can report the > error line as it is in the spec document and not just in the piece > of XML to validate. > > It should be possible to just pull the specs out of CVS. Cool. I can look into updating my cronjob to grab the new specs. > Steve: >> BTW, I can't ssh into portal.open-bio.org, or even ping it. This is (or >> perhaps was) the machine hosting biodas.org. Do you the story here? > > Chris Dag. sent out an email on 3/23 "Important news for all developers > ith open-bio.org CVS access > > (2) All of our websites have been consolidated on the new server > newportal.open-bio.org Yep. Just realized that. At the moment, I can't access my account on this new server. Probably my password got reset. I've got a support request in. Steve From Steve_Chervitz at affymetrix.com Thu Apr 27 15:16:23 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Thu, 27 Apr 2006 12:16:23 -0700 Subject: [DAS2] writeback spec In-Reply-To: Message-ID: OK, Andrew's writeback spec is now accessible at: http://www.biodas.org/documents/das2/das2_writeback.html Be sure to refresh your browsers to get the latest spec at http://biodas.org/documents/das2/das2_protocol.html I re-established my cronjob to update all the documents in this das2 directory twice daily (00:01 and 12:01 East coast time). This das2 directory is a new cvs checkout. I moved the previous das2 directory to das2.old, in case it contains anything we might need that isn't in CVS (accessible via http://www.biodas.org/documents/das2.old/ ). Steve > From: Andrew Dalke > Date: Thu, 27 Apr 2006 01:33:29 -0600 > To: DAS/2 > Subject: [DAS2] writeback spec > > I've written up a draft of the writeback spec. It's in CVS. > > das/das2/das2_writeback.html > with the RNC in > das/das2/writeback.rnc -- for the writeback document > das/das2/mapping.rnc -- for the mapping from old URLs to new > > On the question of how to handle new records, which need > new identifiers, I decided to go with the private identifier > scheme. The client uses "das-private:0000" where the "0000" > is alphanumeric and 1 up to 20 characters long. The server > responds with a mapping document which looks like > > > to="http://blah.com/das2/whatever/feature/123" /> > > > I decided on this instead of the "preallocate identifier" > scheme because this requires less state on the server > (it doesn't need to remember which identifiers were already > issued) and because it supports versioning servers better. > > > Is the web site being updated from CVS? I see it hasn't gotten > the updates I made on Monday. > > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 From dalke at dalkescientific.com Fri Apr 28 13:04:30 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 28 Apr 2006 11:04:30 -0600 Subject: [DAS2] splits and joins in writeback, an alternative Message-ID: <1e3fb6fceaa9c77cc25511181c35e45b@dalkescientific.com> Roy, in private email, pointed out that my writeback spec doesn't include ways to track splits and joins. Here's my response to that topic. I sent it to him last night but resend it here now because I hope to talk about it on Monday. ------ The use model we have is a curator works on a section of the genome for a while (a few hours to perhaps a day). Once done all of the changes are sent back to the server. The writeback document in the current draft looks like ... ... ... ... The message at this point would be "I did a lot of work in the last few hours." It's not very useful. Thinking of it as code, it's like working for a day on code without checking things into version control, so you end up with commit messages with a dozen items in them and it's hard to see which code changes corresponds to which item. What if the writeback delta looked like ... ... ... ... ... ... ... ... ... The MESSAGE is set by the person, the REASON is set by the software, perhaps with details using a controlled vocabulary ("split", "merge", "creation", ...) It feels to me like this gives essentially the same information as explicitly listing how A comes from {X0, X1, X...} features. Perhaps not exactly the same detail, but close enough for what people want. On the plus side it can handle complicated changes, like if 3 features (ranges 100-300, 310-600, 620-800) are converted into 2 (ranges 100-500 and 510-800) merged three elements into two ... ... Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Sun Apr 30 22:37:29 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sun, 30 Apr 2006 20:37:29 -0600 Subject: [DAS2] May 1 is a UK holiday Message-ID: <9c6cb86e3269238eb234bb4b2c6da293@dalkescientific.com> Andreas write to my in private email saying > here in england 1st of may is a public holiday... The hope idea was to talk about writeback, but the UK people (and most specifically Roy) won't be able to make it. Does anyone have any feedback on the writeback spec or comments on my solution to splits and joins? Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Mon Apr 3 07:20:59 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 3 Apr 2006 01:20:59 -0600 Subject: [DAS2] daylight saving time Message-ID: <366941fb271add552809d50a50ab2027@dalkescientific.com> For the non-US people involved in the next phone conference call, the US just changed to daylight saving time so California is now 7 hours behind GMT instead of 8. I think the UK switched a week earlier than the US which is why people there couldn't make it last week? Andrew dalke at dalkescientific.com From lstein at cshl.edu Mon Apr 3 16:53:17 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 3 Apr 2006 12:53:17 -0400 Subject: [DAS2] daylight saving time In-Reply-To: <366941fb271add552809d50a50ab2027@dalkescientific.com> References: <366941fb271add552809d50a50ab2027@dalkescientific.com> Message-ID: <200604031253.17513.lstein@cshl.edu> Hi Guys, I'm stuck on another conf call right now. I'll be joining in 10 min. Lincoln On Monday 03 April 2006 03:20, Andrew Dalke wrote: > For the non-US people involved in the next phone conference call, > the US just changed to daylight saving time so California is now > 7 hours behind GMT instead of 8. I think the UK switched a week > earlier than the US which is why people there couldn't make it > last week? > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 -- Lincoln Stein lstein at cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From mgibson at bdgp.lbl.gov Mon Apr 3 16:29:55 2006 From: mgibson at bdgp.lbl.gov (mark gibson) Date: Mon, 3 Apr 2006 12:29:55 -0400 Subject: [DAS2] Mark Gibson on Apollo writeback to Chado In-Reply-To: References: Message-ID: Ive attached a powerpoint presentation that is probably easier to glance at than reading through this whole email. The first half of it is about apollo transactions. Mark -------------- next part -------------- A non-text attachment was scrubbed... Name: gmod-sri-13.ppt Type: application/vnd.ms-powerpoint Size: 599552 bytes Desc: not available URL: -------------- next part -------------- On Mar 27, 2006, at 2:42 PM, Nomi Harris wrote: > mark gibson said that he plans to attend next monday's DAS/2 > teleconference. he also gave me permission to forward this message > that > he wrote recently in response to a group that is adapting apollo and > wondered what he thought about direct-to-chado writeback vs. the > use of > chadoxml as an intermediate storage format. FlyBase Harvard > prefers to > use the latter approach because (we gather) they worry about possibly > corrupting the database by having clients write directly to it. if > anyone from harvard is reading this and feels that mark has > misrepresented their approach, please set us straight! > > Nomi > > On 10 March 2006, Mark Gibson wrote: >> Im rather biased as a I wrote the chado jdbc adapter [for Apollo], >> but let me put forth my >> view of chado jdbc vs chado xml. >> >> The chado Jdbc adapter is transactional, the chado xml adapter is >> not. What this >> means is jdbc only makes changes in the database that reflect what >> has actually >> been changed in the apollo session, like updating a row in a >> table; with chado >> xml you just get the whole dump. So if a synonym has been added >> jdbc will add a >> row to the synonym table. For xml you will get the whole dump of >> the region you >> were editing (probably a gene) no matter how small the edit. >> >> What I believe Harvard/Flybase then does (with chado xml) is wipe >> out the gene >> from the database and reinsert the gene from the chado xml. The >> problem with >> this approach is if you have data in the db thats not associated >> with apollo >> (for flybase this would be phenotype data) then that will get >> wiped out as well, >> and there has to be some way of reinstating non-apollo data. If >> you dont have >> non-apollo data and dont intend on having it in the future this >> isnt a huge >> issue I suppose. I think Harvard is integrating non-apollo data >> into their chado >> database. >> >> I think what they are going to do is actually figure out all of >> the transactions >> by comparing the chado xml with the chado database, which is what >> apollo already >> does, but I'm not sure as Im not so in touch with them these days >> (as Im not >> working with apollo these days - waiting for new grant to kick in). >> >> Since the paradigm with chado xml is wipe out & reload, then >> apollo has to make >> sure it preserves every bit of the chado xml that came in. Theres >> a bunch of >> stuff thats in chado/chado xml that the apollo datamodel is >> unconcerned with, >> and has no need to be concerned with as its stuff that it doesnt >> visualize. In >> other words apollos data model is solely for apollos task of >> visualizing data, >> not for roundtripping what we call non-apollo data. In writing the >> chado xml >> adapter for FlyBase, Nomi Harris had a heck of a time with these >> issues, and she >> can elaborate on this I suppose. >> >> I'm personally not fond of chado xml because its basically a >> relational database >> dump, so its extremely verbose. It redundantly has information for >> lots of joins >> to data in other tables - like a cvterm entry can take 10 or 20 >> lines of chado >> xml, and a given cvterm may be used a zillion times in a given >> chado xml file >> (as every feature has a cvterm). So these files can get rather large. >> >> The solution for this verbose output is to use what I call macros >> in chado xml. >> Macros are supported by xort. They take the 15 line cvterm entry >> and reduce it >> to a line or 2 making the file size much more reasonable. The >> apollo chado xml >> adapter does not support macros, so you have to use unmacro'd >> chado xml for >> apollo purposes. Nomi Harris had a hard enough time getting the >> chado xml >> adapter working for flybase(and did a great job with a harrowing >> task), that she >> did not have time to take on the macro issue. If you wanted macros >> (and smaller >> file sizes) you would have to add this functionality to the chado >> xml adapter >> (are there java programmers in your group?). >> >> One of the arguments against the jdbc adapter is that its >> dangerous because it >> goes straight into the database so if there are any bugs in the >> data adapter >> then the database could get corrupted - some groups find this a >> bit precarious. >> This is a valid argument. I think theres 2 solutions here. One is >> to thoroughly >> test the adapter out against a test database until you are >> confident that bugs >> are hammered out. >> >> Another solution is to not go straight from apollo to the >> database. You can use >> an interim format and actually use apollo to get that interim >> format into the >> database. Of course one choice for interim format is chado xml and >> then you are >> at the the chado xml solution. The other choice for file format is >> GAME xml. You >> can then use apollo to load game into the chado database, and this >> can be done >> at the command line (with batching) so you dont have to bring up >> the gui to do >> it. Also chado xml can be loaded into chado via apollo as well (of >> course xort >> does this as well but not with transactions) >> >> So then the question is if Im not going to go straight into the >> database, why >> would I choose game over chado xml? Or if Im using chado xml >> should I use >> apollo or xort to load into chado. I think if you are using chado >> xml it makes >> sense to use xort as it is the tried & true technology for chado >> xml. The >> advantage of going through apollo is that it also uses the >> transactions from >> apollo (theres a transaction xml file) and thus writes back the >> edits in a >> transactional way as mentioned above rather than in a wipe out & >> reload fashion. >> >> Also Game is a tried & true technology that has been used with >> apollo in >> production at flybase (before chado came along) for many years >> now. One >> criticism of it has been that DTD/XSD/schema has been a moving >> target, nor has >> it been described. That is not as true anymore. Nomi Harris has >> made a xsd for >> it as well as a rng. But I must confess that I have recently added >> the ability >> to have one level annotations in game (previously 1 levels had to >> be hacked as 3 >> levels). Also game is a lot less verbose than un-macro'd chado >> xml, as it more >> or less fits with the apollo datamodel. One advantage of chado xml >> over game xml >> is that it is more flexible in terms of taking on features of >> arbitrary depth. >> >> The chado xml adapter was developed for FlyBase and as far as I >> know has not >> been taken on by any other groups yet. Nomi can elaborate on this, >> but I think >> what this might mean is that there are places where things are >> FlyBase specific. >> If you went with chado xml the adapter would have to be >> generalized. Its a good >> exercise for the adapter to go through, but it will take a bit of >> work. Nomi can >> probably comment on how hard generalizing might be. I could be >> wrong about this >> but I think the current status with the chado xml adapter is that >> Harvard has >> done a bunch of testing on it but they havent put it into >> production yet. >> >> The jdbc adapter is being used by several groups so has been >> forced to be >> generalized. One thing I have found is that chado databases vary >> all too much >> from mod to mod (ontologies change). There is a configuration file >> for the jdbc >> adapter that has settings for the differences that I encountered. >> I initially >> wrote it for cold spring harbors rice database that will be used >> in classrooms. >> Its working for rice in theory, but they havent actually used it >> much in the >> classroom yet. For rice the model is to save to game and use >> apollo command line >> to save game & transactions back to chado. >> >> Cyril Pommier, at the INRA - URGI - Bioinformatique, has taken on >> the jdbc >> adapter for his group. I have cc'd him on this email as I think he >> will have a >> lot to say about the jdbc adapter. Cyril has uncovered many bugs >> and has fixed a >> lot of them (thank you cyril) as hes a very savvy java programmer. >> And he has >> also forced the adapter to generalize and brought about the >> evolution of the >> config file to adapt to chado differences. But as Cyril can attest >> (Cyril feel >> free to elaborate) it has been a lot of work to get jdbc working >> for him. There >> were a lot of bugs to fix that we both went after. Hopefully now >> its a bit more >> stable and the next db/mod wont have as many problems. I think >> Cyril is still at >> the test phase and hasn't gone into production (Cyril?) >> >> Berkeley is using the jdbc adapter for an in house project. They >> are using the >> jdbc reader to load up game files (as the straight jdbc reader is >> slow as the >> chado db is rather slow) which are then loaded by a curator. They >> are saving >> game, and then I think chris mungall is xslting game to chado xml >> which is then >> saved with xort - or he is somehow writing game in another way - >> not actually >> sure. The Berkeley group drove the need for 1 level annotations(in >> jdbc,game,& >> apollo datmodel) >> >> Jonathan Crabtree at TIGR wrote the jdbc read adapter, and they >> use it there. I >> believe they are intending to use the write adapter but dont yet >> do so (Jonathan?). >> >> I should mention that reading jdbc straight from chado tends to be >> slow, as I >> find that chado is a slow database, at least for Berkeley. It >> really depends on >> the db vendor and the amount of data. TIGRs reading is actually >> really zippy. >> The workaround for slow chados is to dump game files that read in >> pretty fast. >> >> In all fairness, you should probably email with FlyBase (& Chris >> Mungall) and >> get the pros of using chado xml & xort, which they can give a far >> better answer >> on than I. >> >> Hope this helps, >> Mark > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 From lstein at cshl.edu Thu Apr 6 20:08:30 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 6 Apr 2006 16:08:30 -0400 Subject: [DAS2] Global IDs for worm Message-ID: <200604061608.32914.lstein@cshl.edu> I've created a directory in the das CVS under das2/GlobalSeqIDs/ to hold text files describing sequence IDs for common organisms. Currently I've created one for Worm. My schedule for the others is: Drosophilids Yeast Human Mouse Drosophila is the difficult one because there are many partial sequences. I may just do melanogaster for now. Lincoln -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From dalke at dalkescientific.com Mon Apr 10 04:24:24 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Sun, 9 Apr 2006 22:24:24 -0600 Subject: [DAS2] was ill Message-ID: <0436e7cb5802c65cbce1a757a2a31b2f@dalkescientific.com> Hi all, The reason you haven't heard from me in the last week is I was quite ill with an upper respiratory virus, which you heard a bit of in last week's phone conference. I was barely able to read a paragraph at a time, much less write anything coherent. It broke yesterday afternoon and I'm able to work now. Strangest part was on Friday night when I dreamed about parsing RSS feeds and every time I tried to get element [0] I would wake up coughing. That's some virus! Andrew dalke at dalkescientific.com From Gregg_Helt at affymetrix.com Mon Apr 10 17:19:23 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 10 Apr 2006 10:19:23 -0700 Subject: [DAS2] Problem with DAS/2 registry? Message-ID: I've been trying to reach the DAS/2 registry at: http://www.spice-3d.org/dasregistry/das2/sources which used to work, but now I'm getting this error message: Proxy Error The proxy server received an invalid response from an upstream server. The proxy server could not handle the request GET?/dasregistry/das2/sources. Reason: Could not connect to remote machine: Connection refused Apache/1.3.33 Server at www.spice-3d.org Port 80 Any idea what the problem is? Thanks, Gregg From dalke at dalkescientific.com Fri Apr 14 08:29:46 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 14 Apr 2006 02:29:46 -0600 Subject: [DAS2] alignments Message-ID: <5dd5ce9d6d6e977e56c7b4e30e622f7c@dalkescientific.com> I need a bit of help here. I'm trying to hand-write an example of a feature based on an alignment. Let's assume these are annotations on fly and it's aligned to human. There's a hit from fly chromosome 4 http://www.flybase.org/genome/D_melanogaster/R4.3/dna/4 range 100:200 to human chromosome 8 http://www.ensembl.org/Homo_sapiens/Chr1 range 200:300 Assume the CIGAR string of the match is 51 identical, 3 insertions, 24 identical, 3 deletions, 25 identical Here's the best I can manage: First question: Where do I put the object to which the alignment aligns? Will it be a segment or a feature? Now, I could have this completely wrong and DAS2 is not meant for genome/genome alignments like this. If that's the case please offer an example of how to write an alignment. Second question: What's the format of the CIGAR string? Lincoln's text pointed to http://www.ebi.ac.uk/~guy/exonerate/exonerate.man.1.html That documentation says: > The format starts with the same 9 fields as sugar output (see above), > and is followed by a series of pairs where > operation is one of match, insert or delete, and the length describes > the number of times this operation is repeated. However, it does not list the operation characters nor if there are spaces between the fields. I assume it is "M 51 I 3 M 24 D 3 25 I", though perhaps without spaces. The GFF3 documentation at http://song.sourceforge.net/gff3.shtml refers to http://cvsweb.sanger.ac.uk/cgi-bin/cvsweb.cgi/exonerate?cvsroot=Ensembl but I can find no relevant documentation there. I then found a comment by Richard Durbin from two years ago, at http://portal.open-bio.org/pipermail/bioperl-l/2003-February/ 011234.html > 3) I'm not convinced by the format for the Align string. This requires > a character per aligned base. There are a variety of run-length type > encodings in common use that are much more compact. e.g. Ensembl uses > a > string such as "60M1D8M3I15M" to mean "60 match, then 1 delete, then 8 > match, then 3 insert, then 15 match". They call this CIGAR, but when I > talked to Guy Slater, who invented CIGAR for exonerate, his version is > subtly different: "M 60 D 1 M 8 I 3 M 15" for the same string (see > http://www.ensembl.org/Docs/wiki/html/EnsemblDocs/CigarFormat.html). > Jim Kent also has something like this. I'd prefer us to standardise on > one of these formats, all of which are very short for ungapped matches. Which is the CIGAR string format DAS2 supports? Where is the documentation for it? Andrew dalke at dalkescientific.com From aloraine at gmail.com Sat Apr 15 00:05:17 2006 From: aloraine at gmail.com (Ann Loraine) Date: Fri, 14 Apr 2006 19:05:17 -0500 Subject: [DAS2] question regarding most up-to-date D. melanogaster DAS? Message-ID: <83722dde0604141705t369cd016u30f1ca2ea7622d6c@mail.gmail.com> Hi, I'm helping a colleague with an eQTL study and need to do a region-based query on the most up-to-date fruit fly annotations. Our markers (for influential loci in the study) are mapped to cytological bands. Is it possible to run region-based queries using cytological coordinates? (e.g., 30A - 30B, inclusive) My goal is to find all candidate genes under those peaks. I also have (approximate) mappings of cytological bands onto the physical (genomic coordinates) map of Drosophila, so, if necessary, I could use those to collect the genes mapping to those locations. Which fruit fly DAS server would provide the most up-to-date information? If you have other recommendations for how to proceed, I would be grateful for your help! All the best, Ann -- Ann Loraine Assistant Professor Section on Statistical Genetics University of Alabama at Birmingham http://www.ssg.uab.edu http://www.transvar.org From dalke at dalkescientific.com Mon Apr 17 06:54:30 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 17 Apr 2006 00:54:30 -0600 Subject: [DAS2] updated spec Message-ID: Spec writing is like working on a dissertation. Here's an example, in the form of a text adventure http://acephalous.typepad.com/acephalous/2006/04/disadventure.html > look laptop There seems to be a dissertation chapter on the laptop. > read chapter It is long-winded and boring. You do not want to read it. > read chapter It is obnoxious. You hate it. > read book Read. There is a book underneath it that concerns a related topic. > read book Read. There is a book underneath it that concerns a related topic. > work on dissertation You spend two hours searching the OED for the usage history of the word devolve. > work on dissertation You spend three hours reading five articles which have nothing to do with the dissertation. > work on dissertation You spend twenty minutes online reading about baseball. ... > work on dissertation You spend five minutes playing online poker. > work on dissertation You pick your nose. > work on dissertation You go to the kitchen and eat cheese. > work on dissertation The Mets are on. It should be a good game. Anyway, I've gone through the das/das2/draft3/spec.txt document and updated everything (well, not writeback. I'm going to need more cheese.) Next is to get feedback, validate my inline examples, and convert the behemoth into HTML, to replace what's on the web site. Finally. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Mon Apr 17 07:31:13 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 17 Apr 2006 01:31:13 -0600 Subject: [DAS2] outstanding questions Message-ID: These are culled from the current draft of the spec. I used "XXX" to denote regions where I had questions. 1) type ontology URI The TYPE elements have an 'ontology' attribute. This is supposed to be a required element, which is the URI of the corresponding ontology term. At present there is no URI system for ontology. We added a special 'accession' attribute which is the GO id, as in so_accession="SO:0000704" This was meant to be a hack for the hackathon. My thought is: - keep the GO accession (as an optional attribute) - make 'ontology' be an optional attribute, but one of 'ontology' or 'so_accession' is required Also, should that be "SO:0000704" or simply "0000704" ? I think the "SO:" should be present. 2) Feature strand. I want to make sure this is correct 1 for positive -1 for negative 0 for unknown not given for both strands or does not have meaning 3) taxid The 'taxid' in the SOURCE element does not appear to be useful. It's written Notice how the taxid exists in the SOURCE element and the COORDINATES element (and how there are difference taxids for each COORDINATES)? I think we can drop 'taxid' from the SOURCE element and if it's important someone should have a COORDINATES element. 4) 'writeable' The versioned source element contains the attribute "writeable", as in Do we need that 'writeable' attribute? It seems that if there's a writeback capability then then versioned source is writeable. 5) content-type for FASTA records "text/plain", "text/x-fasta" or "chem/x-fasta" Looking around now I also see "application/x-fasta" and "application/fasta". I'm going to say "should be text/x-fasta but may be text/plain". Objections? 6) response document too large I've described that a server may return an error if the response document is too large. This means a client may try again, hopefully making a request which returns a smaller document. My question is, how does a client make a smaller request? What if the server decides that sending more than 5 features at a time is too much? When does the client just give up and say the server implementation is crazy? 7) styles Are we going to go with the current style system or some other approach? The DAS1 styles had support for limited semantic zooming, with options for "high", "medium" and "low" resolution. What do those mean? When should a client choose one over another? What does "height" mean for a glyph? How do the glyph and text interoperate? Eg, is the "height" the height for both, or just for the glyph? Should style information be moved outside of the DAS2 exchange spec? 8) the "count" format We talked about, and people wanted, a "count" format. This returns the number of features which would be returned in a query. Does it really return the number of features, or does it return the number of complex annotations (eg, if there is a complex annotation with a root and two children, is that a count of "1" or a count of "3"? Given the way we've done things, I'm going with "3".) 9) alignments How do I write an alignment? Please give an example - I can't figure it out. 10) CIGAR string What's the format of the CIGAR string? I've found two main variations. They are M 40 I 1 M 12 D 4 40M1I12M4D The latter appears to be the most common. However, I did see one case where if no count is given "1" is implied, so the latter can also be written 40MI12M4D 10) Do we need a REGION element? I've written All feature locations are given in coordinates on a segment. Some features may be locatable on other features. For example, a contig feature may be locatable on a supercontig. This relationship is stored using a REGION element. A FEATURE element has zero or more REGION elements. The 'feature' attribute of the REGION element contains the URI of the parent feature, on which the current feature is located. A REGION record has an optional 'range' attribute. If not given the feature is on the entire parent feature. The range string is the same syntax and meaning as in the LOC record. XXX I think this is overkill - what are some good examples of use; perhaps when the global coordinates are not well-defined?. Are negative coordiantes important, like "promoter region is 20 bases upstream from some gene"? Does this need a CIGAR string too? XXX For example, suppose feature A is 6 bases long and is on chromosome 5 at position 10000, on exon X at position 300 and on contig K at position 7. The FEATURE record for this feature may be as follows: 11) XID Currently the XID element has a single attribute, 'href'. I wrote A FEATURE has zero or more XID elements linking the feature record to an external database entry. XXX This is not well-thought out. I think it should have: 'uri' -- a URL or LSID 'authority' -- the name of the database (controlled vocabulary) 'type' -- 'primary', 'accession', or possibly others? 'id' -- the actual identifier 'description' -- a paragraph or so describing the link, for humans to see why they might want to look into a link This has to be a well-defined concept. Let's steal from someone else. The use-case here is to link to sequence records in other databases and to link to PubMed or other bibliographic databases. 12) complex features In the spec I wrote Some features are complex and cannot easily be modeled with a single feature record. Quoting from the "Chado Schema Documentation" XXX give hyperlink XXX The class of transplicing events that involve ligating transcripts from different loci into a mature mRNA requires a separate feature to represent each locus transcript and one to represent the fused transcript. The fragments are located on the fused transcript; portions of the fused transcript can also be located on the genome. Is this a relevant example of a complex feature for DAS2? If not, give another example. In general I'm having a hard time coming up with good examples of various forms of complex features. I just don't know the domain well enough. 13) "root" attribute I proposed that features have a new, optional attribute called "root". If a feature is part of a complex annotation then the "root" attribute must be present and it must have the URI of the root feature for the annotation. This makes client processing easier, though it is not needed in the purest of senses. 14) features have a 'STYLE' element The idea was that an individual feature could override the style given in the feature type record. I don't think that's useful and/or we need a real stylesheet instead. I'm going to drop the STYLE element from the FEATURE element unless there is objection. 15) In text searches we've defined ABC -- field exactly matches "ABC" *ABC -- field ends with "ABC" ABC* -- field starts with "ABC" *ABC* -- field contains the substring "ABC" I want to say that using "*" and "?" elsewhere in the query string is implementation dependent. That is, "A*B" might match everything with an A followed by a B or it might match the exact string "A*B" and only that string. I did this because looking around at various tools it looks like it might be hard to change the meaning of "*" and "?" for the text searches. Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Mon Apr 17 07:40:07 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 17 Apr 2006 01:40:07 -0600 Subject: [DAS2] proposed April 17 agenda Message-ID: <4fb9a13f4a18a6e1275256affbb97a51@dalkescientific.com> Gregg is taking the month off. I volunteered to be in charge of the next teleconference. Here is what I would like to talk about: 1. get additional agenda items 2. status reports 3. who maintains the list of reference names for different genomes (starting with the list Licoln developed)? 4. resolve some questions with the spec (see my previous email) 5. get a volunteer to come up with best-practices examples of how to represent various complex annotations 6. writeback planning Andrew dalke at dalkescientific.com From lstein at cshl.edu Mon Apr 17 13:46:23 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 17 Apr 2006 09:46:23 -0400 Subject: [DAS2] alignments In-Reply-To: <5dd5ce9d6d6e977e56c7b4e30e622f7c@dalkescientific.com> References: <5dd5ce9d6d6e977e56c7b4e30e622f7c@dalkescientific.com> Message-ID: <200604170946.24479.lstein@cshl.edu> I didn't realize there were multiple things called CIGAR. I think we should use Ensembl CIGAR format. The target of the alignment should be a segment, and not another feature. Best, Lincoln On Friday 14 April 2006 04:29, Andrew Dalke wrote: > I need a bit of help here. I'm trying to hand-write an example of a > feature based on an alignment. Let's assume these are annotations on > fly and it's aligned to human. There's a hit from > > fly chromosome 4 > http://www.flybase.org/genome/D_melanogaster/R4.3/dna/4 > range 100:200 > > to human chromosome 8 > http://www.ensembl.org/Homo_sapiens/Chr1 > range 200:300 > > Assume the CIGAR string of the match is > 51 identical, 3 insertions, 24 identical, 3 deletions, 25 identical > > Here's the best I can manage: > > > > segment="http://www.flybase.org/genome/D_melanogaster/R4.3/dna/4" > range="100:200" cigar="?????"/> > > > > > First question: > Where do I put the object to which the alignment aligns? Will > it be a segment or a feature? Now, I could have this completely wrong > and DAS2 is not meant for genome/genome alignments like this. If > that's the case please offer an example of how to write an alignment. > > > Second question: > What's the format of the CIGAR string? Lincoln's text pointed to > http://www.ebi.ac.uk/~guy/exonerate/exonerate.man.1.html > > That documentation says: > > The format starts with the same 9 fields as sugar output (see above), > > and is followed by a series of pairs where > > operation is one of match, insert or delete, and the length describes > > the number of times this operation is repeated. > > However, it does not list the operation characters nor if there are > spaces > between the fields. I assume it is "M 51 I 3 M 24 D 3 25 I", though > perhaps > without spaces. > > The GFF3 documentation at http://song.sourceforge.net/gff3.shtml refers > to > http://cvsweb.sanger.ac.uk/cgi-bin/cvsweb.cgi/exonerate?cvsroot=Ensembl > but I can find no relevant documentation there. > > I then found a comment by Richard Durbin from two years ago, at > > http://portal.open-bio.org/pipermail/bioperl-l/2003-February/ > 011234.html > > > 3) I'm not convinced by the format for the Align string. This requires > > a character per aligned base. There are a variety of run-length type > > encodings in common use that are much more compact. e.g. Ensembl uses > > a > > string such as "60M1D8M3I15M" to mean "60 match, then 1 delete, then 8 > > match, then 3 insert, then 15 match". They call this CIGAR, but when I > > talked to Guy Slater, who invented CIGAR for exonerate, his version is > > subtly different: "M 60 D 1 M 8 I 3 M 15" for the same string (see > > http://www.ensembl.org/Docs/wiki/html/EnsemblDocs/CigarFormat.html). > > Jim Kent also has something like this. I'd prefer us to standardise on > > one of these formats, all of which are very short for ungapped matches. > > Which is the CIGAR string format DAS2 supports? Where is the > documentation for it? > > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From dalke at dalkescientific.com Mon Apr 17 16:19:47 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 17 Apr 2006 10:19:47 -0600 Subject: [DAS2] question regarding most up-to-date D. melanogaster DAS? In-Reply-To: <83722dde0604141705t369cd016u30f1ca2ea7622d6c@mail.gmail.com> References: <83722dde0604141705t369cd016u30f1ca2ea7622d6c@mail.gmail.com> Message-ID: <3ecdfacc003d58cc93045bc7a4aefb57@dalkescientific.com> Ann: > Our markers (for influential loci in the study) are mapped to > cytological bands. Is it possible to run region-based queries using > cytological coordinates? (e.g., 30A - 30B, inclusive) My goal is to > find all candidate genes under those peaks. At present there is no way to do that. A server can extend the query syntax to support searches in cytological coordinates and add new feature elements to store those coordinates. I don't know enough about how people use those coordinates to sketch an example. Andrew dalke at dalkescientific.com From aloraine at gmail.com Mon Apr 17 17:47:03 2006 From: aloraine at gmail.com (Ann Loraine) Date: Mon, 17 Apr 2006 12:47:03 -0500 Subject: [DAS2] question regarding most up-to-date D. melanogaster DAS? In-Reply-To: <3ecdfacc003d58cc93045bc7a4aefb57@dalkescientific.com> References: <83722dde0604141705t369cd016u30f1ca2ea7622d6c@mail.gmail.com> <3ecdfacc003d58cc93045bc7a4aefb57@dalkescientific.com> Message-ID: <83722dde0604171047r26a32986gaa4c3b34b6166c16@mail.gmail.com> I'm not sure it would be worth adding more work to the project to allow for these cases. If funding is renewed, then I think it would be worth the effort. But for now, probably not, since it would be a new feature. (At this stage, avoiding feature creep seems advisable :-) I believe I can get a mapping of cytological bands onto genomic coordinates from FlyBase. I don't know how reliable these mappings are, but assuming they are okay, I can use them to query a fly DAS site to get the genes in those coordinates. I'm not sure what is the best DAS site to use for this, however. -Ann On 4/17/06, Andrew Dalke wrote: > Ann: > > Our markers (for influential loci in the study) are mapped to > > cytological bands. Is it possible to run region-based queries using > > cytological coordinates? (e.g., 30A - 30B, inclusive) My goal is to > > find all candidate genes under those peaks. > > At present there is no way to do that. > > A server can extend the query syntax to support searches in > cytological coordinates and add new feature elements to store > those coordinates. I don't know enough about how people use > those coordinates to sketch an example. > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 > -- Ann Loraine Assistant Professor Section on Statistical Genetics University of Alabama at Birmingham http://www.ssg.uab.edu http://www.transvar.org From dalke at dalkescientific.com Tue Apr 18 07:36:39 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Tue, 18 Apr 2006 01:36:39 -0600 Subject: [DAS2] proposed April 17 agenda In-Reply-To: <4fb9a13f4a18a6e1275256affbb97a51@dalkescientific.com> References: <4fb9a13f4a18a6e1275256affbb97a51@dalkescientific.com> Message-ID: Summary of today's conference call. > 2. status reports The biggest one is that the new version of IGB is out and the Affy DAS server is available at http://netaffxdas.affymetrix.com/das2/sequence Steve and Ed (as I recall) tracked down a problem with that server which might affect other implementations. The problem is knowing the public/external URL for the DAS service. In theory it can be determined by looking at various CGI headers, but with things like an Apache rewrite and forwards to the actual server it can get complicated. The solution seems to be either use relative links or have a configuration option in the server specifying the base name. Lincoln's been working on reference names. Allen's been working on how the writeback server might work. I've been working on the spec, and have not gone further with the validator. > 3. who maintains the list of reference names for different > genomes (starting with the list Licoln developed)? Lincoln proposed, to broad acceptance, that we set up a wiki page with the reference names. The easiest way is to use the OBF wiki, at http://open-bio.org/wiki/Main_Page because that is already set up. I can ask the OBF about the appropriateness of that - I think it's fine. > 4. resolve some questions with the spec (see my previous email) Here are the resolutions: 1) type ontology URI I've emailed Suzi asking about plans for GO, the Gene Ontology Consortium, whoever in coming up with standardized, public ontology URLs. Allen's cc'ed on it, and we'll discuss this off the DAS list. 2) Feature strand. I stand corrected. The definitions are 1 for positive -1 for negative 0 both strands not don't know or does not have meaning 3) taxid There seems to be no reason to keep the 'taxid' in the SOURCE element. We'll only have it in the COORDINATES element. 4) 'writeable' We'll defer this (leaving it as-is) until we have the writeback defined a bit better. 5) content-type for FASTA records We'll recommend "text/x-fasta" or "text/plain" as the content-type for FASTA responses. There is no widely accepted community standard. 6) response document too large There is no automatic way for a client to narrow its request. This must be done by a person, depending on what the search criteria are. Servers should support large requests so that this isn't a problem. 7) styles We'll shift to using a stylesheet. This will be listed in the versioned source record as As a rough sketch the document will look like The STYLE elements add a new "uri" attribute which is the URI of the feature type being styled. In theory this could also include the feature uri (to define the style for a single feature) or an ontology uri (sets the style for all features with that ontology term or its descendants). However, with that comes problems of precedence. If the feature type and the feature and the ontology each have styles, which one wins? I think feature beats type beats ontology. But I also think we can ignore this because no one has asked for this sort of flexibility. (More flexibility would be support for a query language selecting which features, types, sources, ontologies, feature alias, etc. should get a given style. Not going there. :) 8) the "count" format This should be the number of feature elements returned, and not the number of "annotations" (counting the multiple features of a complex annotation as 1) 9) alignments Lincoln will provide examples. 10) CIGAR string We'll use the EBI style CIGAR strings, and the documentation will be based on the GFF3 description at http://song.sourceforge.net/gff3.shtml 10.5) Do we need a REGION element? No. Deleted from the spec. 11) XID On Ed's recommendation I'm looking at MAGE XML. I am not a good UML reader so it's slow going. My view so far is that what I sketched out is on the right track and we can simplify things compared to MAGE, eg, we don't need full bibliographic records. The other idea is to defer finalizing this until people start providing data with XIDs, so we know what's needed. 12) complex features Lincoln will come up with some examples. 13) "root" attribute There are two changes here: - complex annotations must have a single root feature - all features which are in complex annotations must have a link to the root element There's some worry about the first requirement, in that some complex annotations may not have a "real" root. I argue that having a synthetic one is okay. There were no strong arguments against having a single root. We decided to defer finalizing this until we have some example of complex annotations. 14) features have a 'STYLE' element no, they don't. 15) "*" and "?" in the query string The proposal here is to say that the interpretation of "*" other than at the start and/or end of the query string is implementation defined, as is the use of "?". It used to be that any other use of "*" must be treated as an asterisks, so "***" finds all strings containing a "*". It looks like people are fine with this looseness. > 5. get a volunteer to come up with best-practices examples > of how to represent various complex annotations That's Lincoln. > 6. writeback planning Allen will take the implementation lead on this, funding willing. He's currently working on how to associate an identifier with a new feature. One thought is to progress in stages: - upload completely new features / complex annotations to the server - modify an existing feature, though not the parent/part relationship (eg, change the location) - delete a simple feature - delete a complex annotation - modify an existing complex annotation, or turn a simple feature into a complex annotation - do 'em all at once The work will need to be server driven as the current clients can't handle this before the end of the funding period. The clients will mostly be library code. Andrew dalke at dalkescientific.com From lstein at cshl.edu Mon Apr 24 12:35:21 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 24 Apr 2006 08:35:21 -0400 Subject: [DAS2] Not able to make it today In-Reply-To: References: <4fb9a13f4a18a6e1275256affbb97a51@dalkescientific.com> Message-ID: <200604240835.21690.lstein@cshl.edu> Hi All, Due to wedding preparations I will be unable to attend the conference call today. I might or might not be able to make it next week (I'll be in Toronto) but I'll let you know in advance. Best, Lincoln -- Lincoln Stein lstein at cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From dalke at dalkescientific.com Mon Apr 24 16:11:31 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 24 Apr 2006 10:11:31 -0600 Subject: [DAS2] April 24 meeting - cancel? In-Reply-To: <200604240835.21690.lstein@cshl.edu> References: <4fb9a13f4a18a6e1275256affbb97a51@dalkescientific.com> <200604240835.21690.lstein@cshl.edu> Message-ID: Hi all, I'm trying to come up with an agenda but I've done very little the last week DAS related. I've been working on selling my house. Looks like this will be a short meeting, or should we just cancel? Here's my status. - Sent mail to Suzi asking about URIs for ontologies. Heard nothing from her yet. - Talked with the OBF people about setting up a wiki for the reference names for the genomes/segments. We decided to use the OBF wiki for now and if there are enough pages we'll migrate over to a biodas-specific wiki. I'm about 1/2-way through, learning wiki syntax. I'll email when it's there. - I've migrated the spec 300 doc into CVS. Just checked it in. There's still some formatting issues though. - started working on the stylesheet spec. Should take another 3 hours or so. - haven't been able to log into cgi.biodas.org to restart the validation server. - still need to write an rnc for the writeback for Allen Andrew dalke at dalkescientific.com From allenday at ucla.edu Mon Apr 24 16:29:09 2006 From: allenday at ucla.edu (Allen Day) Date: Mon, 24 Apr 2006 09:29:09 -0700 Subject: [DAS2] April 24 meeting - cancel? In-Reply-To: References: <4fb9a13f4a18a6e1275256affbb97a51@dalkescientific.com> <200604240835.21690.lstein@cshl.edu> Message-ID: <5c24dcc30604240929l7a882dd9qa15c0a51bd636cb0@mail.gmail.com> Let's cancel it. I have a database set up for writeback, and am able to POST delta XML to the server. I am still at the stage where I am parsing the XML. The DTD would be helpful. See attached figure "writeback.png" for the current implementation track. I am at the "Parse XML" step in implementation. See attached "vsourcecommand.png" for an overview of the previous writeback plans as documented in the HTML docs, and "vsourcelock.png" for an overview of lock plans as documented in the HTML docs. Parts of these may at some point be helpful for folding into the current implementation. I can send or commit to CVS the source documents for any of these diagrams if people would like to edit. -Allen On 4/24/06, Andrew Dalke wrote: > > Hi all, > > I'm trying to come up with an agenda but I've done very little > the last week DAS related. I've been working on selling my house. > Looks like this will be a short meeting, or should we just cancel? > > Here's my status. > > - Sent mail to Suzi asking about URIs for ontologies. Heard > nothing from her yet. > > - Talked with the OBF people about setting up a wiki for the > reference names for the genomes/segments. We decided to use the > OBF wiki for now and if there are enough pages we'll migrate over > to a biodas-specific wiki. I'm about 1/2-way through, learning > wiki syntax. I'll email when it's there. > > - I've migrated the spec 300 doc into CVS. Just checked it > in. There's still some formatting issues though. > > - started working on the stylesheet spec. Should take another > 3 hours or so. > > - haven't been able to log into cgi.biodas.org to restart the > validation server. > > - still need to write an rnc for the writeback for Allen > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 > -------------- next part -------------- A non-text attachment was scrubbed... Name: writeback.png Type: image/png Size: 41093 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: vsourcelock.png Type: image/png Size: 91466 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: vsourcecommand.png Type: image/png Size: 49552 bytes Desc: not available URL: From dalke at dalkescientific.com Mon Apr 24 17:39:29 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Mon, 24 Apr 2006 11:39:29 -0600 Subject: [DAS2] sequence names on wiki Message-ID: <6e4986bba9736f1c43f239646b8a22d4@dalkescientific.com> I've imported Lincoln's list of global sequence identifiers onto the open-bio wiki at http://open-bio.org/wiki/DAS:GlobalSeqIDs Andrew dalke at dalkescientific.com From dalke at dalkescientific.com Thu Apr 27 07:33:29 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu, 27 Apr 2006 01:33:29 -0600 Subject: [DAS2] writeback spec Message-ID: I've written up a draft of the writeback spec. It's in CVS. das/das2/das2_writeback.html with the RNC in das/das2/writeback.rnc -- for the writeback document das/das2/mapping.rnc -- for the mapping from old URLs to new On the question of how to handle new records, which need new identifiers, I decided to go with the private identifier scheme. The client uses "das-private:0000" where the "0000" is alphanumeric and 1 up to 20 characters long. The server responds with a mapping document which looks like I decided on this instead of the "preallocate identifier" scheme because this requires less state on the server (it doesn't need to remember which identifiers were already issued) and because it supports versioning servers better. Is the web site being updated from CVS? I see it hasn't gotten the updates I made on Monday. Andrew dalke at dalkescientific.com From Steve_Chervitz at affymetrix.com Thu Apr 27 17:34:12 2006 From: Steve_Chervitz at affymetrix.com (Chervitz, Steve) Date: Thu, 27 Apr 2006 10:34:12 -0700 Subject: [DAS2] writeback spec In-Reply-To: Message-ID: Andrew, > From: Andrew Dalke > Date: Thu, 27 Apr 2006 01:33:29 -0600 > To: DAS/2 > Subject: [DAS2] writeback spec > > I've written up a draft of the writeback spec. It's in CVS. Great. Thanks. > > Is the web site being updated from CVS? I see it hasn't gotten > the updates I made on Monday. You mean in some automated fashion? Before we switched to generating the html from templates, I set up a cron that updated the manually edited html file for the read spec on biodas.org. I don't know if there is an automated process that produces the template-based html from CVS on biodas.org -- unless you or Lincoln set something up. BTW, I can't ssh into portal.open-bio.org, or even ping it. This is (or perhaps was) the machine hosting biodas.org. Do you the story here? Steve From dalke at dalkescientific.com Thu Apr 27 17:55:55 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Thu, 27 Apr 2006 11:55:55 -0600 Subject: [DAS2] writeback spec In-Reply-To: References: Message-ID: Steve: > You mean in some automated fashion? Before we switched to generating > the > html from templates, I set up a cron that updated the manually edited > html > file for the read spec on biodas.org. I don't know if there is an > automated > process that produces the template-based html from CVS on biodas.org -- > unless you or Lincoln set something up. I didn't set anything up. One thing to note though is that I'm not using the template system for the current specs. The validator I have now is much more powerful than the one then so I'm parsing the spec documents and validating them. "More powerful" includes that I can report the error line as it is in the spec document and not just in the piece of XML to validate. It should be possible to just pull the specs out of CVS. > BTW, I can't ssh into portal.open-bio.org, or even ping it. This is (or > perhaps was) the machine hosting biodas.org. Do you the story here? Chris Dag. sent out an email on 3/23 "Important news for all developers ith open-bio.org CVS access (2) All of our websites have been consolidated on the new server newportal.open-bio.org Andrew dalke at dalkescientific.com From Steve_Chervitz at affymetrix.com Thu Apr 27 18:09:09 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Thu, 27 Apr 2006 11:09:09 -0700 Subject: [DAS2] writeback spec In-Reply-To: Message-ID: Andrew: > I didn't set anything up. One thing to note though is that I'm not > using > the template system for the current specs. The validator I have now is > much more powerful than the one then so I'm parsing the spec documents > and validating them. "More powerful" includes that I can report the > error line as it is in the spec document and not just in the piece > of XML to validate. > > It should be possible to just pull the specs out of CVS. Cool. I can look into updating my cronjob to grab the new specs. > Steve: >> BTW, I can't ssh into portal.open-bio.org, or even ping it. This is (or >> perhaps was) the machine hosting biodas.org. Do you the story here? > > Chris Dag. sent out an email on 3/23 "Important news for all developers > ith open-bio.org CVS access > > (2) All of our websites have been consolidated on the new server > newportal.open-bio.org Yep. Just realized that. At the moment, I can't access my account on this new server. Probably my password got reset. I've got a support request in. Steve From Steve_Chervitz at affymetrix.com Thu Apr 27 19:16:23 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Thu, 27 Apr 2006 12:16:23 -0700 Subject: [DAS2] writeback spec In-Reply-To: Message-ID: OK, Andrew's writeback spec is now accessible at: http://www.biodas.org/documents/das2/das2_writeback.html Be sure to refresh your browsers to get the latest spec at http://biodas.org/documents/das2/das2_protocol.html I re-established my cronjob to update all the documents in this das2 directory twice daily (00:01 and 12:01 East coast time). This das2 directory is a new cvs checkout. I moved the previous das2 directory to das2.old, in case it contains anything we might need that isn't in CVS (accessible via http://www.biodas.org/documents/das2.old/ ). Steve > From: Andrew Dalke > Date: Thu, 27 Apr 2006 01:33:29 -0600 > To: DAS/2 > Subject: [DAS2] writeback spec > > I've written up a draft of the writeback spec. It's in CVS. > > das/das2/das2_writeback.html > with the RNC in > das/das2/writeback.rnc -- for the writeback document > das/das2/mapping.rnc -- for the mapping from old URLs to new > > On the question of how to handle new records, which need > new identifiers, I decided to go with the private identifier > scheme. The client uses "das-private:0000" where the "0000" > is alphanumeric and 1 up to 20 characters long. The server > responds with a mapping document which looks like > > > to="http://blah.com/das2/whatever/feature/123" /> > > > I decided on this instead of the "preallocate identifier" > scheme because this requires less state on the server > (it doesn't need to remember which identifiers were already > issued) and because it supports versioning servers better. > > > Is the web site being updated from CVS? I see it hasn't gotten > the updates I made on Monday. > > > Andrew > dalke at dalkescientific.com > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 From dalke at dalkescientific.com Fri Apr 28 17:04:30 2006 From: dalke at dalkescientific.com (Andrew Dalke) Date: Fri, 28 Apr 2006 11:04:30 -0600 Subject: [DAS2] splits and joins in writeback, an alternative Message-ID: <1e3fb6fceaa9c77cc25511181c35e45b@dalkescientific.com> Roy, in private email, pointed out that my writeback spec doesn't include ways to track splits and joins. Here's my response to that topic. I sent it to him last night but resend it here now because I hope to talk about it on Monday. ------ The use model we have is a curator works on a section of the genome for a while (a few hours to perhaps a day). Once done all of the changes are sent back to the server. The writeback document in the current draft looks like ... ... ... ... The message at this point would be "I did a lot of work in the last few hours." It's not very useful. Thinking of it as code, it's like working for a day on code without checking things into version control, so you end up with commit messages with a dozen items in them and it's hard to see which code changes corresponds to which item. What if the writeback delta looked like ... ... ... ... ... ... ... ... ... The MESSAGE is set by the person, the REASON is set by the software, perhaps with details using a controlled vocabulary ("split", "merge", "creation", ...) It feels to me like this gives essentially the same information as explicitly listing how A comes from {X0, X1, X...} features. Perhaps not exactly the same detail, but close enough for what people want. On the plus side it can handle complicated changes, like if 3 features (ranges 100-300, 310-600, 620-800) are converted into 2 (ranges 100-500 and 510-800) merged three elements into two ... ... Andrew dalke at dalkescientific.com