From lstein at cshl.edu Mon Jun 5 10:31:50 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 5 Jun 2006 10:31:50 -0400 Subject: [DAS2] Example alignments In-Reply-To: <7002f99ddafd9542e8b4cb88e1712f9e@dalkescientific.com> References: <5ce0c897a901838a4a94a6331a45a79f@dalkescientific.com> <200605191150.19535.lstein@cshl.edu> <7002f99ddafd9542e8b4cb88e1712f9e@dalkescientific.com> Message-ID: <200606051031.50592.lstein@cshl.edu> Hi Andrew, I'm truly sorry at how long it has taken me to get these examples to you. I hope that the example alignments in the enclosure makes sense to you. Unfortunately I found that I had to add a new "target" attribute to in order to make the cigar string semantics unambiguous. Otherwise you wouldn't be able to tell how to interpret the gaps. Lincoln -- Lincoln Stein lstein at cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) -------------- next part -------------- CASE #1. A SIMPLE PAIRWISE ALIGNMENT. A simple alignment is one in which the alignment is represented as a single feature with no subfeatures. This is the preferred representation to be used when the entire alignment shares the same set of properties. This is an alignment between Chr3 (the reference) and EST23 (the target). Both aligned sequences are in the forward (+) direction. We represent this as a single alignment Chr4 100 CAAGACCTAAA-CTGGAATTCCAATCGCAACTCCTGGACC-TATCTATA 147 |||||||X||| ||||| ||||||| ||||X||| |||||||| EST23 1 CAAGACCAAAATCTGGA-TTCCAAT-------CCTGCACCCTATCTATA 41 This has a CIGAR gap string of M11 I1 M5 D1 M7 D7 M8 I1 M8: M11 match 11 bp I1 insert 1 gap into the reference sequence M5 match 5 bp D1 insert 1 gap into the target sequence M7 match 7 bp D7 insert 7 gaps into the target M8 match 8 bp I1 insert 1 gap into the reference M8 match 8 bp Content-Type: application/x-das-features+xml NOTE: I've had to introduce a new attribute named "target" in order to distinguish the reference sequence from the target sequence. This is necessary for the CIGAR string concepts to work. Perhaps it would be better to have a "role" attribute whose values are one of "ref" and "target?" CASE #2. A COMPLEX PAIRWISE ALIGNMENT. The complex pairwise alignment is used when the alignment is the composite of two different alignments, each of which has its own set of properties. An example of this is BLAST, in which each "BLAST hit" is composed of multiple aligned segments called "HSPs". We extend the previous example by adding another aligned segment to the alignment. BLAST hit: align Chr4:100:300 with EST23:1:58 HSP 1: Chr4 100 CAAGACCTAAA-CTGGAATTCCAATCGCAACTCCTGGACC-TATCTATA 147 |||||||X||| ||||| ||||||| ||||X||| |||||||| EST23 1 CAAGACCAAAATCTGGA-TTCCAAT-------CCTGCACCCTATCTATA 41 BLAST score = 80 CIGAR gap string M11 I1 M5 D1 M7 D7 M8 I1 M8: HSP 2: Chr4 211 TCAAACTGATAATGGGGT 228 ||||||||||| |||||| EST23 42 TCAAACTGATA-TGGGGT 58 BLAST score = 85 CIGAR gap string M11 D1 M6 We represent this as an "expressed_sequence_match" feature relating Chr4 100:300 to EST23 1:58. The feature contains two subparts, one corresponding to the HSP1 and the other corresponding to HSP2. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From Steve_Chervitz at affymetrix.com Mon Jun 5 20:54:21 2006 From: Steve_Chervitz at affymetrix.com (Chervitz, Steve) Date: Mon, 5 Jun 2006 17:54:21 -0700 Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 5 Jun 2006 Message-ID: Notes from the weekly DAS/2 teleconference, 5 Jun 2006 $Id: das2-teleconf-2006-06-05.txt,v 1.2 2006/06/06 00:52:14 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed Erwin, Gregg Helt CSHL: Lincoln Stein Dalke Scientific: Andrew Dalke UCLA: Allen Day, Brian O'connor Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Topic: status reports --------------------- gh: waiting to hear back from peter good re: grant. He thinks we have a decent chance of additional funding (bridge funding), would fund till new grant kicked in in June 2007 with suzi as a PI (revised grant). Total funding would still be less than the amount originally requested for this grant. Definitely will have funding through september this year. our grant folks beefed up the dalke consulting and cshl accounts. Will let people know re: funding past september when I find out. Impl wise, not much done in last 2 weeks. about to start testing writeback from the client side. write new features back to das/2 server (the easiest thing to test). New realease of IGB is out now with a testing curation feature. Go into preferences to turn it on. (Ed worked on this) ls: sent in example of das2 features request that returns alignments. discovered that i needed to add a new attribute to the LOC tag. have to indicate that alignments use the cigar gap string. whether you gap the ref or target sequence and indicate which one's which. there's a target attr in LOC that indicates which one is the target (a little assymetrical). gh: you can get both target and query? ls: yes. the cigar string usind d and i, you have to indicate which one is which. another thing: das/2 project for caBIG is pulling das2 into the core, has a kickoff meeting this wednesday. I will be on that meeting. we'll reiterate goals, timeline with adopters (Wistar institute) gh: it's been a while since we talked about that. is the intent to have das2 servers that can sit on top of caBIG? ls: no, das2 clients via cdBIG. we won't need it for a couple of months, hoping we'll be able to use the biopackages das2 server to serve out the data. Is this reasonable? aday: yes. ad: nothing new to report. settling in Sweden. plan to incorporate Lincoln's things into the spec. server writeback work. bo: working on hyrax client that retrieves microarray data from a das server. functional now and is now in sourceforge. http://sourceforge.net/projects/nelsonlab. uses allen's formatted output rather than netCDF. can browse ontology annotation examples. can download. focusses on individual researcher needs in Nelson lab. plan to do it as a generic plugin, data import tool. gh: for ontology stuff, any progress with suzi and chris re: how das ontology stuff will work with center for biomedical ontologies? aday: no. will touch base with her. we're continuing to operate as previously. basically just a formatting issue. [A] allen will contact with suzi re: hooking up das ontology work with NCBO bo: the document format (XML) right? gh: i think yes. to me the goal is to have NCBO adopt it aday: even if they don't we can still link to them gh: it will take encouragement from you setting that up. aday: you can load the data brian's talking about, egr format. doesn't have location gh: igb should figure it out aday: 25,000 microarrays are available at egr. ids of probe set prefixed with the platform. we have a bed formatter, so you can request in bed to. bo: need to add a pulldown for bed. netCDF is broken now, will fix it. egr is working aday: genotyping array support in igb? gh: chromosome copy number output in igb now. gtype outputs into cnat, which outputs a graph is sgr format. ready by igb. also have files with locations of snps. should be on quickload servers. near bottom entries for 10, 100, 500k arrays. nice way to visualize when zoomed way out. aday: if you load a bed file with ids, then an egr without locations. i.e., can bed files be used as identifiers for egr files? ed: yes gh: takes up more memory, but is useful. aday: working with genotyping arrays lately. will produce more files for it in the next few weeks. basically doing lots of microarray data processing now. gh: das2 writeback server? aday: xml processing code is there, not rigged up to a webserver yet. can partially translate into insert statements. gh: can it send back mapping of temp ids to final? aday: in progress gh: i can start testing creation of features now. aday: can put it as a standalone cgi script, can point it to any url. gh: the beauty of rest. [A] allen will put writeback server on public url ed: new version of igb last week (4.38). automatic reloading via jws not working for some clients. bo: can delete your cache from jws console. ed: shortcut from desktop sometimes causes problems with updates. starting to look at better loading info about colors from different types of data files. seque's into stylesheets from das. and other igb-related things. sc: installed new version of affy das2 server on the dmz. Has gregg's temporary fix for xml:base, but currently doesn't rely on it since there's no url rewriting happening. need to test it out and do same thing on production server. Also wrote script to make deploying servers easier (eg., posting new jars, re-starting server via single make command). [A] steve will test gregg's xml:base fix on dev server Topic: BOSC submission for a talk --------------------------------- ad: planning to go, waiting to determine expenses aday: will go if main conf talk is accepted. otherwise not. gh: sounds like its up to you (dalke) ad: this is what biodas is, tools, how things fit together, how rest is cool. few submissions now (ISMB and BOSC). only 4 now. usually 12 by now. ad: bod for bosc is discussing what to do gh: do you need help from any of us for bosc submission? ad: no. will send you copies to review it. gh: I gave a talk last year on das. will send it to you as a reference. sc: part of talk can be a progress since then. cause of the low turnout? ad: people waiting to see if they are accepted before registering. ls: for me it's a cost issue. 90% of people who practice bioinfo are in northern hemisphere. was low in brisbane, will be low in china (rumors of 2008 ismb in china, can't confirm). Topic: Code sprint #3 --------------------- gh: how do people feel about having another code sprint? possibly before or after CSB in august at Stanford. the last two sprints were very good. ls: I'm at csb in aug, but right after i'll be on a retreat to work on a sequencing grant. right before will be on honeymoon. gh: maybe we need to push it farther out. ad: will be in europe until 15 july. not in us until february. bo: definitely at stanford? gh: no. august seemed like a good time/location. might make more sense to have a euro-led one. sc: august is a big vaction time for europeans ad: july is for swedes. ad: there's a late breaking poster session for ismb gh: das poster? ad: need to decide on cost today if I'm going. Topic: writeback ---------------- gh: how far behind is website vs our current thinking. that's what I'm using for my impl. ad: doesn't have idea of microdeltas. other stuff is the same. ls: does it still have the mapping idea which I thought went away (local to global)? during last codesprint. gh: it did? ad: returns back the complete feature with additional attribute. so instead of a mapping, server returns back all features which changed, along with attribute: old id ---> new id gh: whether you delete things that aren't posted in feature when you submit a new post. ad: what you post is a complete replacement of what was there. gh: that verbage needs to be added. doesn't say anything about it. [A] andrew will add text to writeback spec re: new feat being a complete replacement ad: other change: complex features all need a link back to the root feature. when parsing you can build the parent-part relationship. otherwise, you do a lot more work to figure out whose in the same group. gh: seems like a hack. ls: this is not in the current writeback doc? ad: correct. additional attribute for complex features. affects reads too (not just writeback) ls: bidrectional pointers is still there correct? parent -> child, child -> feature. ad: that's still there (unlike gff: unidirectional) if you know the root, it saves you from having to traverse links, gh: doesn't add that much. may create disagreement, errors between the parent-child hierarchy. I don't think the root thing is necessary. ls: pointer to parent and the root: like a closure across it. don't see a compelling need, makies it harder to impl. gh: if its optional, will create other difficulties. ad: makes it easy to find out where the root is. ls: just go up until you find no parent. cycles would be a bug. the issue would be if during reading from remote server, gives you children first, middle layer, then root layer, will require some merging of features. depends on data structures. in perl with gbrowse, it's holding every feat or part of feat is a node in a graph. it never merges, just updates pointers. after parse finishes, finds everything without parent and recursively traverses them. gh: if you want to attach annotations as features while parsing rather than waiting till parse is done. reference counting. don't think root thing would help then. still need to figure out do I have all children. ad: when you get a failure you can throw away just the failures rather than everything. can count parents and parts as they're coming in. gh: every feature with no parent is a root. ad: yes. assuming it comes early. ls: in general case, you cannot go on and process a feature until you reached the end of the parse. because you could have multiple layers. you can say you have found any pair of layers, not everything in berween. the root ptr doesn't help either. could still be in a situation where you think you processed everything that belongs to a... ad: something comes along later "i'm still a part of that group" gh: every time you get a feature, can add it to the feature tree, can tell when you're done with group by checking pointers. ad: ok. not as useful as I thought. [A] andrew won't add root feat attribute to complex features [so the latter is actually an 'inaction' item ;-] From edgrif at sanger.ac.uk Wed Jun 7 11:35:56 2006 From: edgrif at sanger.ac.uk (Ed Griffiths) Date: Wed, 7 Jun 2006 16:35:56 +0100 (BST) Subject: [DAS2] Example alignments In-Reply-To: <200606051031.50592.lstein@cshl.edu> References: <5ce0c897a901838a4a94a6331a45a79f@dalkescientific.com> <200605191150.19535.lstein@cshl.edu> <7002f99ddafd9542e8b4cb88e1712f9e@dalkescientific.com> <200606051031.50592.lstein@cshl.edu> Message-ID: Lincoln, > I'm truly sorry at how long it has taken me to get these examples to you. I > hope that the example alignments in the enclosure makes sense to you. > > Unfortunately I found that I had to add a new "target" attribute to in > order to make the cigar string semantics unambiguous. Otherwise you wouldn't > be able to tell how to interpret the gaps. I think your idea of having a common "role" is a good one but I wondered if we could use the term "query" for the sequence that is to be aligned (i.e. the EST in your example) and "subject" for the reference sequence ? I also wondered why the hsp hits could not be nested within the overall alignment tags ?...probably that is opening a whole can of worms though.... Ed -- ------------------------------------------------------------------------ | Ed Griffiths, Acedb development, Informatics Group, | | The Morgan Building, Sanger Institute, Wellcome Trust Genome Campus | | Hinxton, Cambridge CB10 1HH | | | | email: edgrif at sanger.ac.uk Tel: +44-1223-496844 Fax: +44-1223-494919 | ------------------------------------------------------------------------ From lstein at cshl.edu Wed Jun 7 12:44:02 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 7 Jun 2006 12:44:02 -0400 Subject: [DAS2] Example alignments In-Reply-To: References: <5ce0c897a901838a4a94a6331a45a79f@dalkescientific.com> <200606051031.50592.lstein@cshl.edu> Message-ID: <200606071244.03598.lstein@cshl.edu> Query and subject are rather BLAST-specific and don't apply to other techniques, such as whole genome alignments. How about using "reference" for the reference sequence and "non-reference" for the target? Lincoln On Wednesday 07 June 2006 11:35, Ed Griffiths wrote: > Lincoln, > > > I'm truly sorry at how long it has taken me to get these examples to you. > > I hope that the example alignments in the enclosure makes sense to you. > > > > Unfortunately I found that I had to add a new "target" attribute to > > in order to make the cigar string semantics unambiguous. Otherwise you > > wouldn't be able to tell how to interpret the gaps. > > I think your idea of having a common "role" is a good one but I wondered if > we could use the term "query" for the sequence that is to be aligned (i.e. > the EST in your example) and "subject" for the reference sequence ? > > I also wondered why the hsp hits could not be nested within the overall > alignment tags ?...probably that is opening a whole can of worms though.... > > Ed -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From Steve_Chervitz at affymetrix.com Wed Jun 7 19:17:50 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Wed, 07 Jun 2006 16:17:50 -0700 Subject: [DAS2] most up-to-date mouse das? (mm7) In-Reply-To: <83722dde0605172120t5853b30al3f931bd6d73092df@mail.gmail.com> Message-ID: Ann, Did you find a solution to your problem of mapping Entrez gene ids into genomic coords? Some suggestions: 1) You can issue DAS/2 queries using gene names or accessions to retrieve coordinate info, for example: http://netaffxdas.affymetrix.com/das2/M_musculus_Aug_2005/features?name=ACTA 1 Or using refseq accession: http://netaffxdas.affymetrix.com/das2/M_musculus_Aug_2005/features?name=NM_0 09606 2) You can?t query the Affymetrix DAS/2 server using an Entrez gene id like 11459 (you could in principle, but it?s not aware of these ids at present). So you?ll need to map from Entrez gene ids into accessions using data from NCBI, such as ftp://ftp.ncbi.nih.gov/gene/gene2refseq.gz 3) There will likely be multiple mRNA sequences associated with a gene id, so you may want to look up the genomic coordinates for each mRNA and take the union of those to get a single location for each gene. Steve > From: Ann Loraine > Date: Wed, 17 May 2006 23:20:47 -0500 > To: Steve Chervitz > Cc: DAS/2 > Subject: Re: most up-to-date mouse das? (mm7) > > Hi Steve, > > Thank you very much for the info! > > Now I have another question... > > I'd like to look up the genomic coordinates of a list of mouse genes > using their numeric Entrez Gene ids. > > If it's not too much bother, do you think you'd be able to give me > some tips on how to do this using DAS? > > btw, the DAS services have been hugely helpful to me in the last week. > We have already found some interesting results with minimal coding. > And the coding was actually fun because there was NO SCREEN-SCRAPING. > Pure bliss. > > -Ann > > On 5/16/06, Steve Chervitz wrote: >> Hi Ann, >> >> The list address has changed. It's now this: das2 at lists.open-bio.org >> >> As for your question, check out the DAS registry server at the Sanger: >> >> http://das.sanger.ac.uk/registry/ >> >> I don't think the registry provides an indication of how current the >> annotations on each registered server for a given data source, such as >> Entrez Gene. It would be a good piece of data to see, though. >> >> As for the Affymetrix DAS/2 server, the mm7 annotations were last updated on >> April 19 2006: >> >> http://netaffxdas.affymetrix.com/das2/sources >> >> The available annotations come from the UCSC server, and derive from the >> knownGene, all_mrna, genscan, and refFlat files (called 'refseq' on the das >> server). Looks like the knownGene data was last updated by UCSC on 15 Dec >> 2005: >> http://hgdownload.cse.ucsc.edu/goldenPath/mm7/database/ >> >> Technical note: The xml:base attribute in the das2xml features document >> returned by the Affy DAS/2 server is currently incorrect. It should be >> >> xml:base="http://netaffxdas.affymetrix.com/das2/M_musculus_Aug_2005/features >> " >> >> instead of >> >> xml:base="http://127.0.0.1:9021/das2/M_musculus_Aug_2005/features" >> >> This will be fixed in the near future. >> >> Steve >> >>> From: Ann Loraine >>> Date: Tue, 16 May 2006 03:52:29 -0500 >>> To: Steve Chervitz >>> Subject: Fwd: most up-to-date mouse das? (mm7) >>> >>> Hi Steve, >>> >>> Would you post this to the DAS/2 list for me? >>> >>> Sorry to bother you, but for some reason my message didn't appear on the >>> list. >>> >>> -Ann >>> >>> ---------- Forwarded message ---------- >>> From: Ann Loraine >>> Date: May 15, 2006 3:35 PM >>> Subject: most up-to-date mouse das? (mm7) >>> To: Andrew Dalke , DAS/2 >>> >>> >>> >>> Hi! >>> >>> I working on a QTL study and need to get all the genes mapping to >>> various regions under peaks. >>> >>> I have the genomic coordinates for the regions so it should be very >>> simple for me to get all accessions (feature ids) underneath those >>> regions using DAS. >>> >>> My question is: what is the most up-to-date server for mm7? >>> >>> Here, of course, is UCSC: >>> >>> http://genome.cse.ucsc.edu/cgi-bin/das/mm7/features?segment=chr1:3000000,400 >>> 00 >>> 00;type=knownGene >>> >>> Ultimately, I'd like to get Entrez Gene ids for the genes under the >>> peaks so that I can start sifting through the candidates using GO. >>> >>> Any tips would be gratefully accepted! >>> >>> All the best, >>> >>> Ann >>> >>> -- >>> Ann Loraine >>> Assistant Professor >>> Section on Statistical Genetics >>> University of Alabama at Birmingham >>> http://www.ssg.uab.edu >>> http://www.transvar.org >>> >>> >>> -- >>> Ann Loraine >>> Assistant Professor >>> Section on Statistical Genetics >>> University of Alabama at Birmingham >>> http://www.ssg.uab.edu >>> http://www.transvar.org >> >> > > > -- > Ann Loraine > Assistant Professor > Section on Statistical Genetics > University of Alabama at Birmingham > http://www.ssg.uab.edu > http://www.transvar.org From aloraine at gmail.com Wed Jun 7 21:43:14 2006 From: aloraine at gmail.com (Ann Loraine) Date: Wed, 7 Jun 2006 20:43:14 -0500 Subject: [DAS2] most up-to-date mouse das? (mm7) In-Reply-To: References: <83722dde0605172120t5853b30al3f931bd6d73092df@mail.gmail.com> Message-ID: <83722dde0606071843x5f1215e0u1681ccd99a7aace4@mail.gmail.com> Thanks Steve! We ended up looking up the genomic positions in a bit of a grueling way. We had a list of gene names from a paper that we thought could influence our eQTLs and then used those to look up (by hand) the corresponding Entrez Gene ids. We used gene2refseq.gz to get RefSeq ids mapping onto the gene ids. Then we used 'bed' files downloaded from UC Santa Cruz to get the genomic coordinates of the RefSeq ids (alignments) and then checked them against our list of genomic regions (peaks). Clearly we could have used DAS to get the positions, which would have saved coding! Live and learn :-) -Ann On 6/7/06, Steve Chervitz wrote: > > Ann, > > Did you find a solution to your problem of mapping Entrez gene ids into > genomic coords? Some suggestions: > > 1) You can issue DAS/2 queries using gene names or accessions to retrieve > coordinate info, for example: > > http://netaffxdas.affymetrix.com/das2/M_musculus_Aug_2005/features?name=ACTA1 > Or using refseq accession: > http://netaffxdas.affymetrix.com/das2/M_musculus_Aug_2005/features?name=NM_009606 > > 2) You can't query the Affymetrix DAS/2 server using an Entrez gene id like > 11459 (you could in principle, but it's not aware of these ids at present). > So you'll need to map from Entrez gene ids into accessions using data from > NCBI, such as ftp://ftp.ncbi.nih.gov/gene/gene2refseq.gz > > 3) There will likely be multiple mRNA sequences associated with a gene id, > so you may want to look up the genomic coordinates for each mRNA and take > the union of those to get a single location for each gene. > > Steve > > > From: Ann Loraine > > Date: Wed, 17 May 2006 23:20:47 -0500 > > To: Steve Chervitz > > Cc: DAS/2 > > Subject: Re: most up-to-date mouse das? (mm7) > > > > Hi Steve, > > > > Thank you very much for the info! > > > > Now I have another question... > > > > I'd like to look up the genomic coordinates of a list of mouse genes > > using their numeric Entrez Gene ids. > > > > If it's not too much bother, do you think you'd be able to give me > > some tips on how to do this using DAS? > > > > btw, the DAS services have been hugely helpful to me in the last week. > > We have already found some interesting results with minimal coding. > > And the coding was actually fun because there was NO SCREEN-SCRAPING. > > Pure bliss. > > > > -Ann > > > > On 5/16/06, Steve Chervitz wrote: > >> Hi Ann, > >> > >> The list address has changed. It's now this: das2 at lists.open-bio.org > >> > >> As for your question, check out the DAS registry server at the Sanger: > >> > >> http://das.sanger.ac.uk/registry/ > >> > >> I don't think the registry provides an indication of how current the > >> annotations on each registered server for a given data source, such as > >> Entrez Gene. It would be a good piece of data to see, though. > >> > >> As for the Affymetrix DAS/2 server, the mm7 annotations were last > updated on > >> April 19 2006: > >> > >> http://netaffxdas.affymetrix.com/das2/sources > >> > >> The available annotations come from the UCSC server, and derive from the > >> knownGene, all_mrna, genscan, and refFlat files (called 'refseq' on the > das > >> server). Looks like the knownGene data was last updated by UCSC on 15 > Dec > >> 2005: > >> http://hgdownload.cse.ucsc.edu/goldenPath/mm7/database/ > >> > >> Technical note: The xml:base attribute in the das2xml features document > >> returned by the Affy DAS/2 server is currently incorrect. It should be > >> > >> > xml:base="http://netaffxdas.affymetrix.com/das2/M_musculus_Aug_2005/features > >> " > >> > >> instead of > >> > >> > xml:base="http://127.0.0.1:9021/das2/M_musculus_Aug_2005/features" > >> > >> This will be fixed in the near future. > >> > >> Steve > >> > >>> From: Ann Loraine > >>> Date: Tue, 16 May 2006 03:52:29 -0500 > >>> To: Steve Chervitz > >>> Subject: Fwd: most up-to-date mouse das? (mm7) > >>> > >>> Hi Steve, > >>> > >>> Would you post this to the DAS/2 list for me? > >>> > >>> Sorry to bother you, but for some reason my message didn't appear on > the > >>> list. > >>> > >>> -Ann > >>> > >>> ---------- Forwarded message ---------- > >>> From: Ann Loraine > >>> Date: May 15, 2006 3:35 PM > >>> Subject: most up-to-date mouse das? (mm7) > >>> To: Andrew Dalke , DAS/2 > >>> > >>> > >>> > >>> Hi! > >>> > >>> I working on a QTL study and need to get all the genes mapping to > >>> various regions under peaks. > >>> > >>> I have the genomic coordinates for the regions so it should be very > >>> simple for me to get all accessions (feature ids) underneath those > >>> regions using DAS. > >>> > >>> My question is: what is the most up-to-date server for mm7? > >>> > >>> Here, of course, is UCSC: > >>> > >>> > http://genome.cse.ucsc.edu/cgi-bin/das/mm7/features?segment=chr1:3000000,400 > >>> 00 > >>> 00;type=knownGene > >>> > >>> Ultimately, I'd like to get Entrez Gene ids for the genes under the > >>> peaks so that I can start sifting through the candidates using GO. > >>> > >>> Any tips would be gratefully accepted! > >>> > >>> All the best, > >>> > >>> Ann > >>> > >>> -- > >>> Ann Loraine > >>> Assistant Professor > >>> Section on Statistical Genetics > >>> University of Alabama at Birmingham > >>> http://www.ssg.uab.edu > >>> http://www.transvar.org > >>> > >>> > >>> -- > >>> Ann Loraine > >>> Assistant Professor > >>> Section on Statistical Genetics > >>> University of Alabama at Birmingham > >>> http://www.ssg.uab.edu > >>> http://www.transvar.org > >> > >> > > > > > > -- > > Ann Loraine > > Assistant Professor > > Section on Statistical Genetics > > University of Alabama at Birmingham > > http://www.ssg.uab.edu > > http://www.transvar.org > -- Ann Loraine Assistant Professor Section on Statistical Genetics University of Alabama at Birmingham http://www.ssg.uab.edu http://www.transvar.org From edgrif at sanger.ac.uk Thu Jun 8 04:15:17 2006 From: edgrif at sanger.ac.uk (Ed Griffiths) Date: Thu, 8 Jun 2006 09:15:17 +0100 (BST) Subject: [DAS2] Example alignments In-Reply-To: <200606071244.03598.lstein@cshl.edu> References: <5ce0c897a901838a4a94a6331a45a79f@dalkescientific.com> <200606051031.50592.lstein@cshl.edu> <200606071244.03598.lstein@cshl.edu> Message-ID: Lincoln, > Query and subject are rather BLAST-specific and don't apply to other > techniques, such as whole genome alignments. How about using "reference" for > the reference sequence and "non-reference" for the target? That seems fine to me, I think the word "target" is ambiguous as I have commonly heard people refer to both the "query" and the "subject" sequences as the "target" ! (but not at the same time of course ;-) Ed -- ------------------------------------------------------------------------ | Ed Griffiths, Acedb development, Informatics Group, | | The Morgan Building, Sanger Institute, Wellcome Trust Genome Campus | | Hinxton, Cambridge CB10 1HH | | | | email: edgrif at sanger.ac.uk Tel: +44-1223-496844 Fax: +44-1223-494919 | ------------------------------------------------------------------------ From lstein at cshl.edu Mon Jun 12 09:46:52 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 12 Jun 2006 09:46:52 -0400 Subject: [DAS2] Can't make conf call today In-Reply-To: References: <5ce0c897a901838a4a94a6331a45a79f@dalkescientific.com> <200606071244.03598.lstein@cshl.edu> Message-ID: <200606120946.53448.lstein@cshl.edu> Hi, I've got a conflict with a grant planning meeting today, so I won't be on the conference call. Next week I'll be in Melbourne for a genetics meeting and I'll miss the call as well. Sorry about that. Lincoln -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From Steve_Chervitz at affymetrix.com Mon Jun 19 15:50:16 2006 From: Steve_Chervitz at affymetrix.com (Chervitz, Steve) Date: Mon, 19 Jun 2006 12:50:16 -0700 Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 19 Jun 2006 Message-ID: Notes from the weekly DAS/2 teleconference, 19 Jun 2006 $Id: das2-teleconf-2006-06-19.txt,v 1.1 2006/06/19 19:48:57 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed Erwin, Gregg Helt UCLA: Allen Day Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. General announcements --------------------- gh: We have received additional funding from NIH extending our support through May 2007. This will provide us the support we need until the new grant would kick in (the grant renewal we're planning to submit Oct 2006). Many thanks to Peter Good who championed our cause at NIH. gh: considering moving das meeting to every two weeks, to get more participation. we used to have alternating weeks -- one week focus on spec, other week focus on implementations. [A] Gregg will broach possible biweekly das/2 meeting schedule on list. gh: Andrew is sick, so he won't be joining today. [Note: Last week only Steve, Gregg, and Ed E were on the call, so there was no major DAS/2 discussion, hence no notes were posted.] Topic: Status reports --------------------- gh: das2 writeback related work in IGB. can write back das2xml. can make curations. options to save as bed or das2xml file. can make a curation track, save as das2xml. there's an id resolution issue. roundtripping works. Next step: make sure IGB can get back a das2 document that has same xml with id mappings to different id. make sure I can swap those. should then be able to writeback to a database. ee: improved sliced view in igb, shows where deleted exons have been deleted. improved threading. slicing happens in a separate interruptable thread. gff3 reading issue on the IGB forum, our parser isn't gff3-ready. gh: deleted exons thing is cool. the gff parser is not fully gff3-compliant. [A] Ed E. will fix gff3 parsing in IGB. ee/gh: implemented a speed up for drawing, min/max. once per pixel. sc: last development was on writing scripts to automate the updating of the affy das/2 servers (dmz), so you can update the jars and re-start the server. Other das-related stuff: Contributed to email discussion thread on the W3C HCLS semantic web mailing list regarding "LSIDs in the wild", provoked by Mark Wilkinson. Looks like about half a dozen or so places that are using LSIDs in some capacity, but not a lot of resolution services out there yet. Getting different data providers to use the LSID syntax alone would be a big win. Asked Lincoln and Brian Gilman about LSIDs at hapmap and caBIG (respectively). No response yet. Also responded to Ann's question on the das/2 list about using DAS to look up genomic coords for a set of Entrez Gene ids. It would be nice to have a way to determine the types of identifiers handled by a given DAS server, so this sort of query could be handled automatically. If a DAS server could provide a list of LSID authorities and namespaces for the types of identifiers it can resolve, that could be used to provide such a look up facility. This type of information could be provided to the das/2 registry server at registration time. gh: yes, but not sure how to best deal with this information. possibly via regular expressions on feature lookup, or xid. sc: Did other work related to Netaffx update preparation and domain mapping project for exon array sequences, doing as collaboration with Melissa Cline. Using Gregg's AnnotMapper. gh: will you provide data as RDF? sc: it's still in flux, but possibly. gh: we were also going to talk about optimizing the data format for the exon array as used on the affy das server, to deal with the growing memory requirements. We can discuss this week. [A] Steve set up mtg with Gregg re: exon array data format for affy das server. aday: working on updates to the biopackages das server. gh: is it ready to handle writeback requests? aday: will be by friday. can you handle different data sources? it's in a separate db. gh: as long as it's listed in sources query. aday: it will be. From aloraine at gmail.com Tue Jun 20 10:23:20 2006 From: aloraine at gmail.com (Ann Loraine) Date: Tue, 20 Jun 2006 09:23:20 -0500 Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 19 Jun 2006 In-Reply-To: References: Message-ID: <83722dde0606200723v320ff21dnc9e73bdcacea7670@mail.gmail.com> Sorry I couldn't attend. My life has been crazy-busy lately with teaching & trying to keep the research on track. A question: Do you have any suggestions for a Web service approach for microarray expression results? We have a biggish (1700+ array hybs) database of expression data from Affymetrix ATH1 arrays. For middleware & other reasons, we are thinking of ways to provide simple CGI access to expression values in the database. The issues we are dealing with are: 1. delivering mappings of probe sets onto other ids (e.g., AGI gene ids) using different authorities: TAIR, us, Affymetrix, University of Michigan, and so on. 2. filtering out probe sets using various critiera, e.g., promiscuous probe sets that match multiple genes, probe sets that "behave badly" in all known experiments, and so on. Each filtering procedure can be given a name. 3. providing expression values generated from 'cel' files using either RMA or MAS5, w/ PMA calls on both Currently we do something very simple for the latter, e.g., http://www.transvar.org/cgi-bin/atweb/get_expr.py?psx=262002_at&psy=250641_at Values come back in tab-delimited format, not XML. The reason we are not using XML is that we want to be able to read the data directly into interactive statistical programming environments like R: > url <- 'http://www.transvar.org/cgi-bin/atweb/get_expr.py?psx=262002_at&psy=250641_at' > dat <- read.delim(url,sep='\t',header=T) > model <- lm(dat[,3]~dat[,2]) > summary(model) > plot(dat[,2],dat[,3]) > abline(model) > cor(dat[,2],dat[,3]) > hist(dat[,2]) > qqnorm(dat[,2]) and so on... R can probably handle XML somehow, but some people are confused by XML. To start, I want to avoid pushing people too far beyond their comfort zone. If you have any tips, please let me know! Right now we only have Arabidopsis data, but we are expanding to include GEO data that meet our various quality-control criteria. (You'd be shocked...maybe?...at how much bad data is in GEO!) -Ann On 6/19/06, Chervitz, Steve wrote: > Notes from the weekly DAS/2 teleconference, 19 Jun 2006 > > $Id: das2-teleconf-2006-06-19.txt,v 1.1 2006/06/19 19:48:57 sac Exp $ > > Note taker: Steve Chervitz > > Attendees: > Affy: Steve Chervitz, Ed Erwin, Gregg Helt > UCLA: Allen Day > > Action items are flagged with '[A]'. > > These notes are checked into the biodas.org CVS repository at > das/das2/notes/2006. Instructions on how to access this > repository are at http://biodas.org > > DISCLAIMER: > The note taker aims for completeness and accuracy, but these goals are > not always achievable, given the desire to get the notes out with a > rapid turnaround. So don't consider these notes as complete minutes > from the meeting, but rather abbreviated, summarized versions of what > was discussed. There may be errors of commission and omission. > Participants are welcome to post comments and/or corrections to these > as they see fit. > > General announcements > --------------------- > > gh: We have received additional funding from NIH extending our support > through May 2007. This will provide us the support we need until the > new grant would kick in (the grant renewal we're planning to submit > Oct 2006). Many thanks to Peter Good who championed our cause at NIH. > > gh: considering moving das meeting to every two weeks, to get more > participation. we used to have alternating weeks -- one week focus on > spec, other week focus on implementations. > > [A] Gregg will broach possible biweekly das/2 meeting schedule on list. > > gh: Andrew is sick, so he won't be joining today. > > [Note: Last week only Steve, Gregg, and Ed E were on the call, so there > was no major DAS/2 discussion, hence no notes were posted.] > > Topic: Status reports > --------------------- > > gh: das2 writeback related work in IGB. can write back das2xml. can > make curations. options to save as bed or das2xml file. can make a > curation track, save as das2xml. there's an id resolution > issue. roundtripping works. > > Next step: make sure IGB can get back a das2 document that has same > xml with id mappings to different id. make sure I can swap > those. should then be able to writeback to a database. > > ee: improved sliced view in igb, shows where deleted exons have been > deleted. improved threading. slicing happens in a separate > interruptable thread. gff3 reading issue on the IGB forum, our parser > isn't gff3-ready. > > gh: deleted exons thing is cool. the gff parser is not fully > gff3-compliant. > > [A] Ed E. will fix gff3 parsing in IGB. > > ee/gh: implemented a speed up for drawing, min/max. once per pixel. > > sc: last development was on writing scripts to automate the updating > of the affy das/2 servers (dmz), so you can update the jars and > re-start the server. > > Other das-related stuff: Contributed to email discussion thread on the > W3C HCLS semantic web mailing list regarding "LSIDs in the wild", > provoked by Mark Wilkinson. Looks like about half a dozen or so places > that are using LSIDs in some capacity, but not a lot of resolution > services out there yet. Getting different data providers to use the > LSID syntax alone would be a big win. Asked Lincoln and Brian Gilman > about LSIDs at hapmap and caBIG (respectively). No response yet. > > Also responded to Ann's question on the das/2 list about using DAS to > look up genomic coords for a set of Entrez Gene ids. It would be nice > to have a way to determine the types of identifiers handled by a given > DAS server, so this sort of query could be handled automatically. If a > DAS server could provide a list of LSID authorities and namespaces for > the types of identifiers it can resolve, that could be used to provide > such a look up facility. This type of information could be provided to > the das/2 registry server at registration time. > > gh: yes, but not sure how to best deal with this information. possibly > via regular expressions on feature lookup, or xid. > > sc: Did other work related to Netaffx update preparation and domain > mapping project for exon array sequences, doing as collaboration with > Melissa Cline. Using Gregg's AnnotMapper. > > gh: will you provide data as RDF? > sc: it's still in flux, but possibly. > > gh: we were also going to talk about optimizing the data format for the > exon array as used on the affy das server, to deal with the growing > memory requirements. We can discuss this week. > > [A] Steve set up mtg with Gregg re: exon array data format for affy das > server. > > aday: working on updates to the biopackages das server. > > gh: is it ready to handle writeback requests? > > aday: will be by friday. can you handle different data sources? it's > in a separate db. > gh: as long as it's listed in sources query. > aday: it will be. > > > > > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 > -- Ann Loraine Assistant Professor Section on Statistical Genetics University of Alabama at Birmingham http://www.ssg.uab.edu http://www.transvar.org From boconnor at ucla.edu Tue Jun 20 14:17:12 2006 From: boconnor at ucla.edu (Brian O'Connor) Date: Tue, 20 Jun 2006 11:17:12 -0700 Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 19 Jun 2006 In-Reply-To: <83722dde0606200723v320ff21dnc9e73bdcacea7670@mail.gmail.com> References: <83722dde0606200723v320ff21dnc9e73bdcacea7670@mail.gmail.com> Message-ID: <44983BA8.2090500@ucla.edu> Hi Ann, So there's a spec/implementation by Allen for a DAS/2 "Assay" server that would be a good jumping off point for what you want. The Nelson lab at UCLA is currently using it to server up thousands of microarray results across many different platforms. To get an idea of what's there look at the spec doc here: http://www.biodas.org/documents/das2/das2_assay.html There are some example URLs in the spec that should work (the server was down when I tried just a minute ago but should be available soon). You can retrieve expressions data using a URL similar to what you were using before: http://das.biopackages.net/das/assay/human/17/result/SN:1007162?format=mgr;protocol=rma That returns a tab-delimited file containing the RMA normalized results for this sample. The assay das server is already included in the DAS/2 rpm. The only tricky part is loading expression data into a chado instance. Allen could provide you with better guidance there than I can. Alternatively, if you have your own backend storage for the expression data you may want to write a new adapter for the DAS/2 server rather then exporting your data to another DB. --Brian Ann Loraine wrote: >Sorry I couldn't attend. My life has been crazy-busy lately with >teaching & trying to keep the research on track. > >A question: Do you have any suggestions for a Web service approach for >microarray expression results? > >We have a biggish (1700+ array hybs) database of expression data from >Affymetrix ATH1 arrays. For middleware & other reasons, we are >thinking of ways to provide simple CGI access to expression values in >the database. > >The issues we are dealing with are: > >1. delivering mappings of probe sets onto other ids (e.g., AGI gene >ids) using different authorities: TAIR, us, Affymetrix, University of >Michigan, and so on. > >2. filtering out probe sets using various critiera, e.g., promiscuous >probe sets that match multiple genes, probe sets that "behave badly" >in all known experiments, and so on. Each filtering procedure can be >given a name. > >3. providing expression values generated from 'cel' files using either >RMA or MAS5, w/ PMA calls on both > >Currently we do something very simple for the latter, e.g., > >http://www.transvar.org/cgi-bin/atweb/get_expr.py?psx=262002_at&psy=250641_at > >Values come back in tab-delimited format, not XML. The reason we are >not using XML is that we want to be able to read the data directly >into interactive statistical programming environments like R: > > > >>url <- 'http://www.transvar.org/cgi-bin/atweb/get_expr.py?psx=262002_at&psy=250641_at' >>dat <- read.delim(url,sep='\t',header=T) >>model <- lm(dat[,3]~dat[,2]) >>summary(model) >>plot(dat[,2],dat[,3]) >>abline(model) >>cor(dat[,2],dat[,3]) >>hist(dat[,2]) >>qqnorm(dat[,2]) >> >> > >and so on... > >R can probably handle XML somehow, but some people are confused by >XML. To start, I want to avoid pushing people too far beyond their >comfort zone. > >If you have any tips, please let me know! > >Right now we only have Arabidopsis data, but we are expanding to >include GEO data that meet our various quality-control criteria. >(You'd be shocked...maybe?...at how much bad data is in GEO!) > >-Ann > >On 6/19/06, Chervitz, Steve wrote: > > >>Notes from the weekly DAS/2 teleconference, 19 Jun 2006 >> >>$Id: das2-teleconf-2006-06-19.txt,v 1.1 2006/06/19 19:48:57 sac Exp $ >> >>Note taker: Steve Chervitz >> >>Attendees: >> Affy: Steve Chervitz, Ed Erwin, Gregg Helt >> UCLA: Allen Day >> >>Action items are flagged with '[A]'. >> >>These notes are checked into the biodas.org CVS repository at >>das/das2/notes/2006. Instructions on how to access this >>repository are at http://biodas.org >> >>DISCLAIMER: >>The note taker aims for completeness and accuracy, but these goals are >>not always achievable, given the desire to get the notes out with a >>rapid turnaround. So don't consider these notes as complete minutes >>from the meeting, but rather abbreviated, summarized versions of what >>was discussed. There may be errors of commission and omission. >>Participants are welcome to post comments and/or corrections to these >>as they see fit. >> >>General announcements >>--------------------- >> >>gh: We have received additional funding from NIH extending our support >>through May 2007. This will provide us the support we need until the >>new grant would kick in (the grant renewal we're planning to submit >>Oct 2006). Many thanks to Peter Good who championed our cause at NIH. >> >>gh: considering moving das meeting to every two weeks, to get more >>participation. we used to have alternating weeks -- one week focus on >>spec, other week focus on implementations. >> >>[A] Gregg will broach possible biweekly das/2 meeting schedule on list. >> >>gh: Andrew is sick, so he won't be joining today. >> >>[Note: Last week only Steve, Gregg, and Ed E were on the call, so there >>was no major DAS/2 discussion, hence no notes were posted.] >> >>Topic: Status reports >>--------------------- >> >>gh: das2 writeback related work in IGB. can write back das2xml. can >>make curations. options to save as bed or das2xml file. can make a >>curation track, save as das2xml. there's an id resolution >>issue. roundtripping works. >> >>Next step: make sure IGB can get back a das2 document that has same >>xml with id mappings to different id. make sure I can swap >>those. should then be able to writeback to a database. >> >>ee: improved sliced view in igb, shows where deleted exons have been >>deleted. improved threading. slicing happens in a separate >>interruptable thread. gff3 reading issue on the IGB forum, our parser >>isn't gff3-ready. >> >>gh: deleted exons thing is cool. the gff parser is not fully >>gff3-compliant. >> >>[A] Ed E. will fix gff3 parsing in IGB. >> >>ee/gh: implemented a speed up for drawing, min/max. once per pixel. >> >>sc: last development was on writing scripts to automate the updating >>of the affy das/2 servers (dmz), so you can update the jars and >>re-start the server. >> >>Other das-related stuff: Contributed to email discussion thread on the >>W3C HCLS semantic web mailing list regarding "LSIDs in the wild", >>provoked by Mark Wilkinson. Looks like about half a dozen or so places >>that are using LSIDs in some capacity, but not a lot of resolution >>services out there yet. Getting different data providers to use the >>LSID syntax alone would be a big win. Asked Lincoln and Brian Gilman >>about LSIDs at hapmap and caBIG (respectively). No response yet. >> >>Also responded to Ann's question on the das/2 list about using DAS to >>look up genomic coords for a set of Entrez Gene ids. It would be nice >>to have a way to determine the types of identifiers handled by a given >>DAS server, so this sort of query could be handled automatically. If a >>DAS server could provide a list of LSID authorities and namespaces for >>the types of identifiers it can resolve, that could be used to provide >>such a look up facility. This type of information could be provided to >>the das/2 registry server at registration time. >> >>gh: yes, but not sure how to best deal with this information. possibly >>via regular expressions on feature lookup, or xid. >> >>sc: Did other work related to Netaffx update preparation and domain >>mapping project for exon array sequences, doing as collaboration with >>Melissa Cline. Using Gregg's AnnotMapper. >> >>gh: will you provide data as RDF? >>sc: it's still in flux, but possibly. >> >>gh: we were also going to talk about optimizing the data format for the >>exon array as used on the affy das server, to deal with the growing >>memory requirements. We can discuss this week. >> >>[A] Steve set up mtg with Gregg re: exon array data format for affy das >>server. >> >>aday: working on updates to the biopackages das server. >> >>gh: is it ready to handle writeback requests? >> >>aday: will be by friday. can you handle different data sources? it's >>in a separate db. >>gh: as long as it's listed in sources query. >>aday: it will be. >> >> >> >> >> >>_______________________________________________ >>DAS2 mailing list >>DAS2 at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/das2 >> >> >> > > > > From allenday at ucla.edu Wed Jun 21 04:27:10 2006 From: allenday at ucla.edu (Allen Day) Date: Wed, 21 Jun 2006 01:27:10 -0700 Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 19 Jun 2006 In-Reply-To: <83722dde0606200723v320ff21dnc9e73bdcacea7670@mail.gmail.com> References: <83722dde0606200723v320ff21dnc9e73bdcacea7670@mail.gmail.com> Message-ID: <5c24dcc30606210127l7a9687fao53b40aab5db0833c@mail.gmail.com> > > 1. delivering mappings of probe sets onto other ids (e.g., AGI gene > ids) using different authorities: TAIR, us, Affymetrix, University of > Michigan, and so on. We're doing this with the NetAffx schema that has been loaded to Postgres/Chado and full-text indexed. I think we have Affy probeset -> TAIR ID mappings, but not the others. 2. filtering out probe sets using various critiera, e.g., promiscuous > probe sets that match multiple genes, probe sets that "behave badly" > in all known experiments, and so on. Each filtering procedure can be > given a name. Yes, that is something I am looking at right now. Actually, as you get more and more arrays the probeset behavior becomes very clear, with many transcripts showing discrete on/off states, e.g. a bunch of genes highly expressed in human tongue: taste receptor, type 2, member 1 http://celsius-cgi.genomics.ctrl.ucla.edu/cgi/plot_element.Rsh?221324_at gastrin-releasing peptide receptor http://celsius-cgi.genomics.ctrl.ucla.edu/cgi/plot_element.Rsh?207929_at olfactory receptor, family 10 http://celsius-cgi.genomics.ctrl.ucla.edu/cgi/plot_element.Rsh?221346_at natural cytotoxicity triggering receptor http://celsius-cgi.genomics.ctrl.ucla.edu/cgi/plot_element.Rsh?217088_s_at There are even clear trimodals, like thyroid receptor alpha: http://celsius-cgi.genomics.ctrl.ucla.edu/cgi/plot_element.Rsh?1316_at 3. providing expression values generated from 'cel' files using either > RMA or MAS5, w/ PMA calls on both Yes, you can do this in R with XML, but it's a pain. Better for expression data to use TSV as you are doing. We have an R lib in development for doing large batch retrieval of hundreds of arrays. Getting annotation into R turns out to be easier with XML as it just easier to represent in the more flexible format. -Allen From allenday at ucla.edu Wed Jun 21 04:08:34 2006 From: allenday at ucla.edu (Allen Day) Date: Wed, 21 Jun 2006 01:08:34 -0700 Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 19 Jun 2006 In-Reply-To: <44983BA8.2090500@ucla.edu> References: <83722dde0606200723v320ff21dnc9e73bdcacea7670@mail.gmail.com> <44983BA8.2090500@ucla.edu> Message-ID: <5c24dcc30606210108u3a7e9226v119c157be713e3f3@mail.gmail.com> Hi Ann, I think Brian meant to form a URL like this: http://das.biopackages.net/das/assay/celsius/1/result/SN:1007162?format=egr;protocol=rma As mentioned, we have an Affy data warehouse project going on over here. Currently in contains more than 36000 CEL files in raw and various normal flavors. 1251 of these are the ATH1-121501 platform. We typically import 300-500 arrays/week. All of GEO is already present (about 14000 CEL files), as well as several other sites' data (ArrayExpress, Broad Instittute, ...). We are currently advertising a normalization service whereby users can /anonymously/ drop off raw CEL data, and get back normalized results within a few hours, dependent on our compute cluster usage. Typically we can flip an array in about 30 minutes. We store the CEL and normalized data permanently for retrieval later, and for our own meta-analyses. At the other extreme, if you're interested in doing regular bulk import, we're also happy to set up a weekly mirror where we sync the data to our site and then process it. If you're interested in either of these, or a setup somewhere in between let me know. -Allen On 6/20/06, Brian O'Connor wrote: > > Hi Ann, > > So there's a spec/implementation by Allen for a DAS/2 "Assay" server > that would be a good jumping off point for what you want. The Nelson > lab at UCLA is currently using it to server up thousands of microarray > results across many different platforms. To get an idea of what's there > look at the spec doc here: > http://www.biodas.org/documents/das2/das2_assay.html > > There are some example URLs in the spec that should work (the server was > down when I tried just a minute ago but should be available soon). You > can retrieve expressions data using a URL similar to what you were using > before: > > > http://das.biopackages.net/das/assay/human/17/result/SN:1007162?format=mgr;protocol=rma > > That returns a tab-delimited file containing the RMA normalized results > for this sample. > > The assay das server is already included in the DAS/2 rpm. The only > tricky part is loading expression data into a chado instance. Allen > could provide you with better guidance there than I can. > Alternatively, if you have your own backend storage for the expression > data you may want to write a new adapter for the DAS/2 server rather > then exporting your data to another DB. > > --Brian > > Ann Loraine wrote: > > >Sorry I couldn't attend. My life has been crazy-busy lately with > >teaching & trying to keep the research on track. > > > >A question: Do you have any suggestions for a Web service approach for > >microarray expression results? > > > >We have a biggish (1700+ array hybs) database of expression data from > >Affymetrix ATH1 arrays. For middleware & other reasons, we are > >thinking of ways to provide simple CGI access to expression values in > >the database. > > > >The issues we are dealing with are: > > > >1. delivering mappings of probe sets onto other ids (e.g., AGI gene > >ids) using different authorities: TAIR, us, Affymetrix, University of > >Michigan, and so on. > > > >2. filtering out probe sets using various critiera, e.g., promiscuous > >probe sets that match multiple genes, probe sets that "behave badly" > >in all known experiments, and so on. Each filtering procedure can be > >given a name. > > > >3. providing expression values generated from 'cel' files using either > >RMA or MAS5, w/ PMA calls on both > > > >Currently we do something very simple for the latter, e.g., > > > > > http://www.transvar.org/cgi-bin/atweb/get_expr.py?psx=262002_at&psy=250641_at > > > >Values come back in tab-delimited format, not XML. The reason we are > >not using XML is that we want to be able to read the data directly > >into interactive statistical programming environments like R: > > > > > > > >>url <- ' > http://www.transvar.org/cgi-bin/atweb/get_expr.py?psx=262002_at&psy=250641_at > ' > >>dat <- read.delim(url,sep='\t',header=T) > >>model <- lm(dat[,3]~dat[,2]) > >>summary(model) > >>plot(dat[,2],dat[,3]) > >>abline(model) > >>cor(dat[,2],dat[,3]) > >>hist(dat[,2]) > >>qqnorm(dat[,2]) > >> > >> > > > >and so on... > > > >R can probably handle XML somehow, but some people are confused by > >XML. To start, I want to avoid pushing people too far beyond their > >comfort zone. > > > >If you have any tips, please let me know! > > > >Right now we only have Arabidopsis data, but we are expanding to > >include GEO data that meet our various quality-control criteria. > >(You'd be shocked...maybe?...at how much bad data is in GEO!) > > > >-Ann > > > >On 6/19/06, Chervitz, Steve wrote: > > > > > >>Notes from the weekly DAS/2 teleconference, 19 Jun 2006 > >> > >>$Id: das2-teleconf-2006-06-19.txt,v 1.1 2006/06/19 19:48:57 sac Exp $ > >> > >>Note taker: Steve Chervitz > >> > >>Attendees: > >> Affy: Steve Chervitz, Ed Erwin, Gregg Helt > >> UCLA: Allen Day > >> > >>Action items are flagged with '[A]'. > >> > >>These notes are checked into the biodas.org CVS repository at > >>das/das2/notes/2006. Instructions on how to access this > >>repository are at http://biodas.org > >> > >>DISCLAIMER: > >>The note taker aims for completeness and accuracy, but these goals are > >>not always achievable, given the desire to get the notes out with a > >>rapid turnaround. So don't consider these notes as complete minutes > >>from the meeting, but rather abbreviated, summarized versions of what > >>was discussed. There may be errors of commission and omission. > >>Participants are welcome to post comments and/or corrections to these > >>as they see fit. > >> > >>General announcements > >>--------------------- > >> > >>gh: We have received additional funding from NIH extending our support > >>through May 2007. This will provide us the support we need until the > >>new grant would kick in (the grant renewal we're planning to submit > >>Oct 2006). Many thanks to Peter Good who championed our cause at NIH. > >> > >>gh: considering moving das meeting to every two weeks, to get more > >>participation. we used to have alternating weeks -- one week focus on > >>spec, other week focus on implementations. > >> > >>[A] Gregg will broach possible biweekly das/2 meeting schedule on list. > >> > >>gh: Andrew is sick, so he won't be joining today. > >> > >>[Note: Last week only Steve, Gregg, and Ed E were on the call, so there > >>was no major DAS/2 discussion, hence no notes were posted.] > >> > >>Topic: Status reports > >>--------------------- > >> > >>gh: das2 writeback related work in IGB. can write back das2xml. can > >>make curations. options to save as bed or das2xml file. can make a > >>curation track, save as das2xml. there's an id resolution > >>issue. roundtripping works. > >> > >>Next step: make sure IGB can get back a das2 document that has same > >>xml with id mappings to different id. make sure I can swap > >>those. should then be able to writeback to a database. > >> > >>ee: improved sliced view in igb, shows where deleted exons have been > >>deleted. improved threading. slicing happens in a separate > >>interruptable thread. gff3 reading issue on the IGB forum, our parser > >>isn't gff3-ready. > >> > >>gh: deleted exons thing is cool. the gff parser is not fully > >>gff3-compliant. > >> > >>[A] Ed E. will fix gff3 parsing in IGB. > >> > >>ee/gh: implemented a speed up for drawing, min/max. once per pixel. > >> > >>sc: last development was on writing scripts to automate the updating > >>of the affy das/2 servers (dmz), so you can update the jars and > >>re-start the server. > >> > >>Other das-related stuff: Contributed to email discussion thread on the > >>W3C HCLS semantic web mailing list regarding "LSIDs in the wild", > >>provoked by Mark Wilkinson. Looks like about half a dozen or so places > >>that are using LSIDs in some capacity, but not a lot of resolution > >>services out there yet. Getting different data providers to use the > >>LSID syntax alone would be a big win. Asked Lincoln and Brian Gilman > >>about LSIDs at hapmap and caBIG (respectively). No response yet. > >> > >>Also responded to Ann's question on the das/2 list about using DAS to > >>look up genomic coords for a set of Entrez Gene ids. It would be nice > >>to have a way to determine the types of identifiers handled by a given > >>DAS server, so this sort of query could be handled automatically. If a > >>DAS server could provide a list of LSID authorities and namespaces for > >>the types of identifiers it can resolve, that could be used to provide > >>such a look up facility. This type of information could be provided to > >>the das/2 registry server at registration time. > >> > >>gh: yes, but not sure how to best deal with this information. possibly > >>via regular expressions on feature lookup, or xid. > >> > >>sc: Did other work related to Netaffx update preparation and domain > >>mapping project for exon array sequences, doing as collaboration with > >>Melissa Cline. Using Gregg's AnnotMapper. > >> > >>gh: will you provide data as RDF? > >>sc: it's still in flux, but possibly. > >> > >>gh: we were also going to talk about optimizing the data format for the > >>exon array as used on the affy das server, to deal with the growing > >>memory requirements. We can discuss this week. > >> > >>[A] Steve set up mtg with Gregg re: exon array data format for affy das > >>server. > >> > >>aday: working on updates to the biopackages das server. > >> > >>gh: is it ready to handle writeback requests? > >> > >>aday: will be by friday. can you handle different data sources? it's > >>in a separate db. > >>gh: as long as it's listed in sources query. > >>aday: it will be. > >> > >> > >> > >> > >> > >>_______________________________________________ > >>DAS2 mailing list > >>DAS2 at lists.open-bio.org > >>http://lists.open-bio.org/mailman/listinfo/das2 > >> > >> > >> > > > > > > > > > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 > From allenday at ucla.edu Sat Jun 24 05:24:19 2006 From: allenday at ucla.edu (Allen Day) Date: Sat, 24 Jun 2006 02:24:19 -0700 Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 19 Jun 2006 In-Reply-To: <5c24dcc30606240219x674f4b4p824bb0c979db5185@mail.gmail.com> References: <5c24dcc30606240219x674f4b4p824bb0c979db5185@mail.gmail.com> Message-ID: <5c24dcc30606240224s5a3836acyebab930c28d518ac@mail.gmail.com> You can see the features that are posted here: http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature It is fully compatible with the usual yeast source: http://das.biopackages.net/das/genome/yeast/S228C/feature All the usual feature filters apply. The response at this URL is not cached to keep the content fresh, at the expense of ever-slower load times as written features accumulate. -Allen On 6/24/06, Allen Day wrote: > > I have a temporary CGI set up to accept WRITEBACK documents: > > http://genomics.ctrl.ucla.edu/~allenday/cgi-bin/das2xml-parser/stable1.pl > > I have attached a das2xml document that POSTs cleanly for me using the > lwp-request that is part of libwww-perl. Please modify this document, post, > and let me know if anything breaks. > > This implementation accepts only new records. It supports neither updates > nor deletes. Furthermore, it only accepts new feature records. It does not > support new type records, new region records, or any other type of record. > > Feature records may have 0 or more locations, 0 or more parents, 0 or more > children, and 0 or more properties. All parts/parents must be present in > the document (no refering to existing features by URI), or it will throw a > HTTP 500 error. > > Next I will implement the update and delete support. This should be > fairly straightforward, and may be doable over the weekend. > > -Allen > > > On 6/19/06, Chervitz, Steve wrote: > > > Notes from the weekly DAS/2 teleconference, 19 Jun 2006 > > > > $Id: das2-teleconf-2006-06-19.txt,v 1.1 2006/06/19 19:48:57 sac Exp $ > > > > Note taker: Steve Chervitz > > > > Attendees: > > Affy: Steve Chervitz, Ed Erwin, Gregg Helt > > UCLA: Allen Day > > > > Action items are flagged with '[A]'. > > > > These notes are checked into the biodas.org CVS repository at > > das/das2/notes/2006. Instructions on how to access this > > repository are at http://biodas.org > > > > DISCLAIMER: > > The note taker aims for completeness and accuracy, but these goals are > > not always achievable, given the desire to get the notes out with a > > rapid turnaround. So don't consider these notes as complete minutes > > from the meeting, but rather abbreviated, summarized versions of what > > was discussed. There may be errors of commission and omission. > > Participants are welcome to post comments and/or corrections to these > > as they see fit. > > > > General announcements > > --------------------- > > > > gh: We have received additional funding from NIH extending our support > > through May 2007. This will provide us the support we need until the > > new grant would kick in (the grant renewal we're planning to submit > > Oct 2006). Many thanks to Peter Good who championed our cause at NIH. > > > > gh: considering moving das meeting to every two weeks, to get more > > participation. we used to have alternating weeks -- one week focus on > > spec, other week focus on implementations. > > > > [A] Gregg will broach possible biweekly das/2 meeting schedule on list. > > > > gh: Andrew is sick, so he won't be joining today. > > > > [Note: Last week only Steve, Gregg, and Ed E were on the call, so there > > was no major DAS/2 discussion, hence no notes were posted.] > > > > Topic: Status reports > > --------------------- > > > > gh: das2 writeback related work in IGB. can write back das2xml. can > > make curations. options to save as bed or das2xml file. can make a > > curation track, save as das2xml. there's an id resolution > > issue. roundtripping works. > > > > Next step: make sure IGB can get back a das2 document that has same > > xml with id mappings to different id. make sure I can swap > > those. should then be able to writeback to a database. > > > > ee: improved sliced view in igb, shows where deleted exons have been > > deleted. improved threading. slicing happens in a separate > > interruptable thread. gff3 reading issue on the IGB forum, our parser > > isn't gff3-ready. > > > > gh: deleted exons thing is cool. the gff parser is not fully > > gff3-compliant. > > > > [A] Ed E. will fix gff3 parsing in IGB. > > > > ee/gh: implemented a speed up for drawing, min/max. once per pixel. > > > > sc: last development was on writing scripts to automate the updating > > of the affy das/2 servers (dmz), so you can update the jars and > > re-start the server. > > > > Other das-related stuff: Contributed to email discussion thread on the > > W3C HCLS semantic web mailing list regarding "LSIDs in the wild", > > provoked by Mark Wilkinson. Looks like about half a dozen or so places > > that are using LSIDs in some capacity, but not a lot of resolution > > services out there yet. Getting different data providers to use the > > LSID syntax alone would be a big win. Asked Lincoln and Brian Gilman > > about LSIDs at hapmap and caBIG (respectively). No response yet. > > > > Also responded to Ann's question on the das/2 list about using DAS to > > look up genomic coords for a set of Entrez Gene ids. It would be nice > > to have a way to determine the types of identifiers handled by a given > > DAS server, so this sort of query could be handled automatically. If a > > DAS server could provide a list of LSID authorities and namespaces for > > the types of identifiers it can resolve, that could be used to provide > > such a look up facility. This type of information could be provided to > > the das/2 registry server at registration time. > > > > gh: yes, but not sure how to best deal with this information. possibly > > via regular expressions on feature lookup, or xid. > > > > sc: Did other work related to Netaffx update preparation and domain > > mapping project for exon array sequences, doing as collaboration with > > Melissa Cline. Using Gregg's AnnotMapper. > > > > gh: will you provide data as RDF? > > sc: it's still in flux, but possibly. > > > > gh: we were also going to talk about optimizing the data format for the > > exon array as used on the affy das server, to deal with the growing > > memory requirements. We can discuss this week. > > > > [A] Steve set up mtg with Gregg re: exon array data format for affy das > > server. > > > > aday: working on updates to the biopackages das server. > > > > gh: is it ready to handle writeback requests? > > > > aday: will be by friday. can you handle different data sources? it's > > in a separate db. > > gh: as long as it's listed in sources query. > > aday: it will be. > > > > > > > > > > > > _______________________________________________ > > DAS2 mailing list > > DAS2 at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/das2 > > > > > From allenday at ucla.edu Sat Jun 24 05:19:46 2006 From: allenday at ucla.edu (Allen Day) Date: Sat, 24 Jun 2006 02:19:46 -0700 Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 19 Jun 2006 In-Reply-To: References: Message-ID: <5c24dcc30606240219x674f4b4p824bb0c979db5185@mail.gmail.com> I have a temporary CGI set up to accept WRITEBACK documents: http://genomics.ctrl.ucla.edu/~allenday/cgi-bin/das2xml-parser/stable1.pl I have attached a das2xml document that POSTs cleanly for me using the lwp-request that is part of libwww-perl. Please modify this document, post, and let me know if anything breaks. This implementation accepts only new records. It supports neither updates nor deletes. Furthermore, it only accepts new feature records. It does not support new type records, new region records, or any other type of record. Feature records may have 0 or more locations, 0 or more parents, 0 or more children, and 0 or more properties. All parts/parents must be present in the document (no refering to existing features by URI), or it will throw a HTTP 500 error. Next I will implement the update and delete support. This should be fairly straightforward, and may be doable over the weekend. -Allen On 6/19/06, Chervitz, Steve wrote: > > Notes from the weekly DAS/2 teleconference, 19 Jun 2006 > > $Id: das2-teleconf-2006-06-19.txt,v 1.1 2006/06/19 19:48:57 sac Exp $ > > Note taker: Steve Chervitz > > Attendees: > Affy: Steve Chervitz, Ed Erwin, Gregg Helt > UCLA: Allen Day > > Action items are flagged with '[A]'. > > These notes are checked into the biodas.org CVS repository at > das/das2/notes/2006. Instructions on how to access this > repository are at http://biodas.org > > DISCLAIMER: > The note taker aims for completeness and accuracy, but these goals are > not always achievable, given the desire to get the notes out with a > rapid turnaround. So don't consider these notes as complete minutes > from the meeting, but rather abbreviated, summarized versions of what > was discussed. There may be errors of commission and omission. > Participants are welcome to post comments and/or corrections to these > as they see fit. > > General announcements > --------------------- > > gh: We have received additional funding from NIH extending our support > through May 2007. This will provide us the support we need until the > new grant would kick in (the grant renewal we're planning to submit > Oct 2006). Many thanks to Peter Good who championed our cause at NIH. > > gh: considering moving das meeting to every two weeks, to get more > participation. we used to have alternating weeks -- one week focus on > spec, other week focus on implementations. > > [A] Gregg will broach possible biweekly das/2 meeting schedule on list. > > gh: Andrew is sick, so he won't be joining today. > > [Note: Last week only Steve, Gregg, and Ed E were on the call, so there > was no major DAS/2 discussion, hence no notes were posted.] > > Topic: Status reports > --------------------- > > gh: das2 writeback related work in IGB. can write back das2xml. can > make curations. options to save as bed or das2xml file. can make a > curation track, save as das2xml. there's an id resolution > issue. roundtripping works. > > Next step: make sure IGB can get back a das2 document that has same > xml with id mappings to different id. make sure I can swap > those. should then be able to writeback to a database. > > ee: improved sliced view in igb, shows where deleted exons have been > deleted. improved threading. slicing happens in a separate > interruptable thread. gff3 reading issue on the IGB forum, our parser > isn't gff3-ready. > > gh: deleted exons thing is cool. the gff parser is not fully > gff3-compliant. > > [A] Ed E. will fix gff3 parsing in IGB. > > ee/gh: implemented a speed up for drawing, min/max. once per pixel. > > sc: last development was on writing scripts to automate the updating > of the affy das/2 servers (dmz), so you can update the jars and > re-start the server. > > Other das-related stuff: Contributed to email discussion thread on the > W3C HCLS semantic web mailing list regarding "LSIDs in the wild", > provoked by Mark Wilkinson. Looks like about half a dozen or so places > that are using LSIDs in some capacity, but not a lot of resolution > services out there yet. Getting different data providers to use the > LSID syntax alone would be a big win. Asked Lincoln and Brian Gilman > about LSIDs at hapmap and caBIG (respectively). No response yet. > > Also responded to Ann's question on the das/2 list about using DAS to > look up genomic coords for a set of Entrez Gene ids. It would be nice > to have a way to determine the types of identifiers handled by a given > DAS server, so this sort of query could be handled automatically. If a > DAS server could provide a list of LSID authorities and namespaces for > the types of identifiers it can resolve, that could be used to provide > such a look up facility. This type of information could be provided to > the das/2 registry server at registration time. > > gh: yes, but not sure how to best deal with this information. possibly > via regular expressions on feature lookup, or xid. > > sc: Did other work related to Netaffx update preparation and domain > mapping project for exon array sequences, doing as collaboration with > Melissa Cline. Using Gregg's AnnotMapper. > > gh: will you provide data as RDF? > sc: it's still in flux, but possibly. > > gh: we were also going to talk about optimizing the data format for the > exon array as used on the affy das server, to deal with the growing > memory requirements. We can discuss this week. > > [A] Steve set up mtg with Gregg re: exon array data format for affy das > server. > > aday: working on updates to the biopackages das server. > > gh: is it ready to handle writeback requests? > > aday: will be by friday. can you handle different data sources? it's > in a separate db. > gh: as long as it's listed in sources query. > aday: it will be. > > > > > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 > -------------- next part -------------- A non-text attachment was scrubbed... Name: new.xml Type: text/xml Size: 1032 bytes Desc: not available URL: From lstein at cshl.edu Mon Jun 26 10:56:31 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 26 Jun 2006 10:56:31 -0400 Subject: [DAS2] Can't make conf call today In-Reply-To: <200606120946.53448.lstein@cshl.edu> References: <5ce0c897a901838a4a94a6331a45a79f@dalkescientific.com> <200606120946.53448.lstein@cshl.edu> Message-ID: <200606261056.32160.lstein@cshl.edu> Hi Folks, Sorry to do this three weeks in a row, but I have to teach at noon today so I'll miss the conf call. Lincoln On Monday 12 June 2006 09:46, Lincoln Stein wrote: > Hi, > > I've got a conflict with a grant planning meeting today, so I won't be on > the conference call. Next week I'll be in Melbourne for a genetics meeting > and I'll miss the call as well. > > Sorry about that. > > Lincoln -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From aloraine at gmail.com Mon Jun 26 11:59:12 2006 From: aloraine at gmail.com (Ann Loraine) Date: Mon, 26 Jun 2006 10:59:12 -0500 Subject: [DAS2] Can't make conf call today In-Reply-To: <200606261056.32160.lstein@cshl.edu> References: <5ce0c897a901838a4a94a6331a45a79f@dalkescientific.com> <200606120946.53448.lstein@cshl.edu> <200606261056.32160.lstein@cshl.edu> Message-ID: <83722dde0606260859m6da790f4peeec9ff917f948fb@mail.gmail.com> I will miss it too. My apologies. I look forward to reading Steve's summary. Best, Ann On 6/26/06, Lincoln Stein wrote: > Hi Folks, > > Sorry to do this three weeks in a row, but I have to teach at noon today so > I'll miss the conf call. > > Lincoln > > On Monday 12 June 2006 09:46, Lincoln Stein wrote: > > Hi, > > > > I've got a conflict with a grant planning meeting today, so I won't be on > > the conference call. Next week I'll be in Melbourne for a genetics meeting > > and I'll miss the call as well. > > > > Sorry about that. > > > > Lincoln > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 > -- Ann Loraine Assistant Professor Section on Statistical Genetics University of Alabama at Birmingham http://www.ssg.uab.edu http://www.transvar.org From Steve_Chervitz at affymetrix.com Mon Jun 26 13:58:08 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Mon, 26 Jun 2006 10:58:08 -0700 Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 26 Jun 2006 Message-ID: Notes from the weekly DAS/2 teleconference, 26 Jun 2006 $Id: das2-teleconf-2006-06-26.txt,v 1.2 2006/06/26 17:56:11 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed Erwin, Gregg Helt Dalke Scientific: Andrew Dalke Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Status reports: --------------- gh: grant update: requested 300k, may get 210-220k. will cover us at our current burn rate. based on 3mos funding left over from prev funding, so new request only needs to cover 8mos (through may 2007). still waiting to hear back. Allocated 30k for andrew's work. Originally 25k for each of two years. ad: heard back from suzi from ontology urls. not on das list. will summarize. spoke to others at NCBO about setting up a service. will ask Karen Eilbeck (genetics.utah.edu). She only has access to song.sourceforge.net domain. not good long term because it's dependent on sf. there is a general domain for the group, but not one she has access to. what do we want on the pages. xml, html? gh: meeting suzi this week. can ask more about ontology stuff. gh: we also discussed moving this das/2 meeting to biweekly. ad: makes sense. spec vs impl. sc: should the spec vs impl discussion alternate biweekly? gh: no, cover everything each meeting. [A] das/2 meeting will now be biweekly. [A] next meeting is July 10. No meeting on 3 July (US holiday). sc: is spec still frozen? ad: no, just haven't worked on it. if you want to make changes go ahead. [A] steve will fix broken in-page links on the read spec html. sc: discussed with Gregg last week about migrating affy das/1 server data to das/2. also experiencing growing pains due to more arrays to support (the affy das server is in-memory). So we strategized over a more efficient data model for the exon array data, which eats up a lot of memory (100-200 MB per array per genome version). In thinking about it more, seems too ambitious to get the more efficient data model *and* do the das/2 migration for the July Netaffx update. Another issue is that we now provide the bgn files as a separate download from Netaffx, so if we provide bp2 format, users will have to upgrade their IGB as well (this wouldn't be a concern for folks launching IGB via java web start, which should be most users). gh: bp2 format isn't too hard to do. just adds an array id field, since exon array identifiers are not integers ("1:2345678"). good plan is to first move to more efficient data model on das/1 to solve memory issues, then focus on migration to das/2. Other status of note: ---------------------- Allen day has announced the availability of his writeback server: http://lists.open-bio.org/pipermail/das2/2006-June/000744.html From Gregg_Helt at affymetrix.com Mon Jun 26 15:23:37 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 26 Jun 2006 12:23:37 -0700 Subject: [DAS2] DAS/2 writeback Message-ID: Allen, thanks for getting the start of a writeback server up and running! I'm hoping to try writing back annotations later today. However, I'm having problems looking at the annotations in the writeback server via IGB. It looks to me like the main issue is that http://das.biopackages.net/das/genome/yeast/S228C-writeback/segment returns human chromosome ids in the uri attribute of the SEGMENT element, instead of yeast ids. When IGB uses this to compose a query like http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature?over laps=chr1/10:15000;type=SO:centromere it gets back an empty feature list. But if I manually edit this to replace "chr1" with "chrI", http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature?over laps=chrI/10:15000;type=SO:centromere I get back a list of feature that satisfies the query filters. Any ideas? Thanks, Gregg > -----Original Message----- > From: das2-bounces at lists.open-bio.org [mailto:das2-bounces at lists.open- > bio.org] On Behalf Of Allen Day > Sent: Saturday, June 24, 2006 2:24 AM > To: Chervitz, Steve > Cc: DAS/2 > Subject: Re: [DAS2] Notes from the weekly DAS/2 teleconference, 19 Jun > 2006 > > You can see the features that are posted here: > > http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature > > It is fully compatible with the usual yeast source: > > http://das.biopackages.net/das/genome/yeast/S228C/feature > > All the usual feature filters apply. The response at this URL is not > cached > to keep the content fresh, at the expense of ever-slower load times as > written features accumulate. > > -Allen > > > > On 6/24/06, Allen Day wrote: > > > > I have a temporary CGI set up to accept WRITEBACK documents: > > > > http://genomics.ctrl.ucla.edu/~allenday/cgi-bin/das2xml- > parser/stable1.pl bin/das2xml-parser/stable1.pl> > > > > I have attached a das2xml document that POSTs cleanly for me using the > > lwp-request that is part of libwww-perl. Please modify this document, > post, > > and let me know if anything breaks. > > > > This implementation accepts only new records. It supports neither > updates > > nor deletes. Furthermore, it only accepts new feature records. It does > not > > support new type records, new region records, or any other type of > record. > > > > Feature records may have 0 or more locations, 0 or more parents, 0 or > more > > children, and 0 or more properties. All parts/parents must be present > in > > the document (no refering to existing features by URI), or it will throw > a > > HTTP 500 error. > > > > Next I will implement the update and delete support. This should be > > fairly straightforward, and may be doable over the weekend. > > > > -Allen > > From allenday at ucla.edu Mon Jun 26 17:11:18 2006 From: allenday at ucla.edu (Allen Day) Date: Mon, 26 Jun 2006 14:11:18 -0700 Subject: [DAS2] DAS/2 writeback In-Reply-To: References: Message-ID: <5c24dcc30606261411m49032b0fkd65fcb3022a0826d@mail.gmail.com> This datasource contains both yeast and human segments. I set it up this way so features can be written for either human or yeast, then viewed at one of: http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature http://das.biopackages.net/das/genome/human/17-writeback/feature I thought this would be more useful so you can test viewing writeback features alongside both "real" human and yeast features. So you can just ignore the irrelevant segments, or if you'd prefer I can delete one set of segments or the other. -Allen On 6/26/06, Helt,Gregg wrote: > > Allen, thanks for getting the start of a writeback server up and > running! > > I'm hoping to try writing back annotations later today. However, I'm > having problems looking at the annotations in the writeback server via > IGB. It looks to me like the main issue is that > http://das.biopackages.net/das/genome/yeast/S228C-writeback/segment > returns human chromosome ids in the uri attribute of the SEGMENT > element, instead of yeast ids. When IGB uses this to compose a query > like > > http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature?over > laps=chr1/10:15000;type=SO:centromere > > it gets back an empty feature list. But if I manually edit this to > replace "chr1" with "chrI", > > http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature?over > laps=chrI/10:15000;type=SO:centromere > > I get back a list of feature that satisfies the query filters. > > Any ideas? > > Thanks, > Gregg > > > -----Original Message----- > > From: das2-bounces at lists.open-bio.org [mailto:das2-bounces at lists.open- > > bio.org] On Behalf Of Allen Day > > Sent: Saturday, June 24, 2006 2:24 AM > > To: Chervitz, Steve > > Cc: DAS/2 > > Subject: Re: [DAS2] Notes from the weekly DAS/2 teleconference, 19 Jun > > 2006 > > > > You can see the features that are posted here: > > > > http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature > > > > It is fully compatible with the usual yeast source: > > > > http://das.biopackages.net/das/genome/yeast/S228C/feature > > > > All the usual feature filters apply. The response at this URL is not > > cached > > to keep the content fresh, at the expense of ever-slower load times as > > written features accumulate. > > > > -Allen > > > > > > > > On 6/24/06, Allen Day wrote: > > > > > > I have a temporary CGI set up to accept WRITEBACK documents: > > > > > > http://genomics.ctrl.ucla.edu/~allenday/cgi-bin/das2xml- > > parser/stable1.pl > bin/das2xml-parser/stable1.pl> > > > > > > I have attached a das2xml document that POSTs cleanly for me using > the > > > lwp-request that is part of libwww-perl. Please modify this > document, > > post, > > > and let me know if anything breaks. > > > > > > This implementation accepts only new records. It supports neither > > updates > > > nor deletes. Furthermore, it only accepts new feature records. It > does > > not > > > support new type records, new region records, or any other type of > > record. > > > > > > Feature records may have 0 or more locations, 0 or more parents, 0 > or > > more > > > children, and 0 or more properties. All parts/parents must be > present > > in > > > the document (no refering to existing features by URI), or it will > throw > > a > > > HTTP 500 error. > > > > > > Next I will implement the update and delete support. This should be > > > fairly straightforward, and may be doable over the weekend. > > > > > > -Allen > > > > From Gregg_Helt at affymetrix.com Mon Jun 26 17:20:48 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 26 Jun 2006 14:20:48 -0700 Subject: [DAS2] DAS/2 writeback Message-ID: But there's no entry in the segment query for chrI, which is the chromosome you used in the the example XML you posted. So I can't find those annotations via IGB. gregg -----Original Message----- From: allenday at gmail.com [mailto:allenday at gmail.com] On Behalf Of Allen Day Sent: Monday, June 26, 2006 2:11 PM To: Helt,Gregg Cc: DAS/2 Subject: Re: DAS/2 writeback This datasource contains both yeast and human segments. I set it up this way so features can be written for either human or yeast, then viewed at one of: http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature http://das.biopackages.net/das/genome/human/17-writeback/feature I thought this would be more useful so you can test viewing writeback features alongside both "real" human and yeast features. So you can just ignore the irrelevant segments, or if you'd prefer I can delete one set of segments or the other. -Allen On 6/26/06, Helt,Gregg wrote: Allen, thanks for getting the start of a writeback server up and running! I'm hoping to try writing back annotations later today. However, I'm having problems looking at the annotations in the writeback server via IGB. It looks to me like the main issue is that http://das.biopackages.net/das/genome/yeast/S228C-writeback/segment returns human chromosome ids in the uri attribute of the SEGMENT element, instead of yeast ids. When IGB uses this to compose a query like http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature?over laps=chr1/10:15000;type=SO:centromere it gets back an empty feature list. But if I manually edit this to replace "chr1" with "chrI", http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature?over laps=chrI/10:15000;type=SO:centromere I get back a list of feature that satisfies the query filters. Any ideas? Thanks, Gregg > -----Original Message----- > From: das2-bounces at lists.open-bio.org [mailto:das2-bounces at lists.open - > bio.org] On Behalf Of Allen Day > Sent: Saturday, June 24, 2006 2:24 AM > To: Chervitz, Steve > Cc: DAS/2 > Subject: Re: [DAS2] Notes from the weekly DAS/2 teleconference, 19 Jun > 2006 > > You can see the features that are posted here: > > http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature > > It is fully compatible with the usual yeast source: > > http://das.biopackages.net/das/genome/yeast/S228C/feature > > All the usual feature filters apply. The response at this URL is not > cached > to keep the content fresh, at the expense of ever-slower load times as > written features accumulate. > > -Allen > > > > On 6/24/06, Allen Day wrote: > > > > I have a temporary CGI set up to accept WRITEBACK documents: > > > > http://genomics.ctrl.ucla.edu/~allenday/cgi-bin/das2xml- > parser/stable1.pl< http://genomics.ctrl.ucla.edu/%7Eallenday/cgi- > bin/das2xml-parser/stable1.pl> > > > > I have attached a das2xml document that POSTs cleanly for me using the > > lwp-request that is part of libwww-perl. Please modify this document, > post, > > and let me know if anything breaks. > > > > This implementation accepts only new records. It supports neither > updates > > nor deletes. Furthermore, it only accepts new feature records. It does > not > > support new type records, new region records, or any other type of > record. > > > > Feature records may have 0 or more locations, 0 or more parents, 0 or > more > > children, and 0 or more properties. All parts/parents must be present > in > > the document (no refering to existing features by URI), or it will throw > a > > HTTP 500 error. > > > > Next I will implement the update and delete support. This should be > > fairly straightforward, and may be doable over the weekend. > > > > -Allen > > From Gregg_Helt at affymetrix.com Mon Jun 26 18:43:13 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 26 Jun 2006 15:43:13 -0700 Subject: [DAS2] DAS/2 writeback Message-ID: Thanks, that helped, the problem was on my end. Looks like I was POSTing but not making sure the data buffer was flushed out to the server before I tried to read the server response. I fixed that, now I get a mapping document back. I'm not sure how much effort to put into parsing the mapping doc though - the next update of the spec is supposed to change so that rather than a new mapping document type, the server responds with the full feature XML of the created/updated features. More progress - if I stick to the human genome (chr21), after the writeback I'm able to retrieve the features via DAS/2 and visualize in IGB. thanks again, Gregg -----Original Message----- From: allenday at gmail.com [mailto:allenday at gmail.com] On Behalf Of Allen Day Sent: Monday, June 26, 2006 3:04 PM To: Helt,Gregg Subject: Re: DAS/2 writeback The content-type shouldn't matter. I think I was submitting with application/x-form-encoded, or something like that. The CGI just reads whatever you sent it directly from STDIN though. >From the error log it looks like you are not POSTing the document. There is no web page there, you need to issue a POST request, XML doc as the body, directly to that CGI. -Allen On 6/26/06, Helt,Gregg wrote: That's my guess. Also, I'm attempting to post variations of your XML example to http://genomics.ctrl.ucla.edu/~allenday/cgi-bin/das2xml-parser/stable1.p l , but so far I've only gotten "500 Internal Server Error" responses back. Can you tell what's happening? Do I need to set the content-type for the POST or some other header? gregg -----Original Message----- From: allenday at gmail.com [mailto:allenday at gmail.com] On Behalf Of Allen Day Sent: Monday, June 26, 2006 2:40 PM To: Helt,Gregg Subject: Re: DAS/2 writeback Ok, i will check into it. Only showing human segments, is it? -Allen On 6/26/06, Helt,Gregg wrote: But there's no entry in the segment query for chrI, which is the chromosome you used in the the example XML you posted. So I can't find those annotations via IGB. gregg -----Original Message----- From: allenday at gmail.com [mailto:allenday at gmail.com] On Behalf Of Allen Day Sent: Monday, June 26, 2006 2:11 PM To: Helt,Gregg Cc: DAS/2 Subject: Re: DAS/2 writeback This datasource contains both yeast and human segments. I set it up this way so features can be written for either human or yeast, then viewed at one of: http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature http://das.biopackages.net/das/genome/human/17-writeback/feature I thought this would be more useful so you can test viewing writeback features alongside both "real" human and yeast features. So you can just ignore the irrelevant segments, or if you'd prefer I can delete one set of segments or the other. -Allen On 6/26/06, Helt,Gregg wrote: Allen, thanks for getting the start of a writeback server up and running! I'm hoping to try writing back annotations later today. However, I'm having problems looking at the annotations in the writeback server via IGB. It looks to me like the main issue is that http://das.biopackages.net/das/genome/yeast/S228C-writeback/segment returns human chromosome ids in the uri attribute of the SEGMENT element, instead of yeast ids. When IGB uses this to compose a query like http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature?over laps=chr1/10:15000;type=SO:centromere it gets back an empty feature list. But if I manually edit this to replace "chr1" with "chrI", http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature?over laps=chrI/10:15000;type=SO:centromere I get back a list of feature that satisfies the query filters. Any ideas? Thanks, Gregg > -----Original Message----- > From: das2-bounces at lists.open-bio.org [mailto:das2-bounces at lists.open - > bio.org] On Behalf Of Allen Day > Sent: Saturday, June 24, 2006 2:24 AM > To: Chervitz, Steve > Cc: DAS/2 > Subject: Re: [DAS2] Notes from the weekly DAS/2 teleconference, 19 Jun > 2006 > > You can see the features that are posted here: > > http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature > > It is fully compatible with the usual yeast source: > > http://das.biopackages.net/das/genome/yeast/S228C/feature > > All the usual feature filters apply. The response at this URL is not > cached > to keep the content fresh, at the expense of ever-slower load times as > written features accumulate. > > -Allen > > > > On 6/24/06, Allen Day wrote: > > > > I have a temporary CGI set up to accept WRITEBACK documents: > > > > http://genomics.ctrl.ucla.edu/~allenday/cgi-bin/das2xml- > parser/stable1.pl< http://genomics.ctrl.ucla.edu/%7Eallenday/cgi- > bin/das2xml-parser/stable1.pl> > > > > I have attached a das2xml document that POSTs cleanly for me using the > > lwp-request that is part of libwww-perl. Please modify this document, > post, > > and let me know if anything breaks. > > > > This implementation accepts only new records. It supports neither > updates > > nor deletes. Furthermore, it only accepts new feature records. It does > not > > support new type records, new region records, or any other type of > record. > > > > Feature records may have 0 or more locations, 0 or more parents, 0 or > more > > children, and 0 or more properties. All parts/parents must be present > in > > the document (no refering to existing features by URI), or it will throw > a > > HTTP 500 error. > > > > Next I will implement the update and delete support. This should be > > fairly straightforward, and may be doable over the weekend. > > > > -Allen > > From allenday at ucla.edu Mon Jun 26 18:51:28 2006 From: allenday at ucla.edu (Allen Day) Date: Mon, 26 Jun 2006 15:51:28 -0700 Subject: [DAS2] DAS/2 writeback In-Reply-To: References: Message-ID: <5c24dcc30606261551t7ec17d9dh9adce375f0fef470@mail.gmail.com> Great. Why the full das2xml feature doc instead of the mapping doc as response? -Allen On 6/26/06, Helt,Gregg wrote: > > Thanks, that helped, the problem was on my end. Looks like I was POSTingbut not making sure the data buffer was flushed out to the server before I > tried to read the server response. I fixed that, now I get a mapping > document back. I'm not sure how much effort to put into parsing the > mapping doc though ? the next update of the spec is supposed to change so > that rather than a new mapping document type, the server responds with the > full feature XML of the created/updated features. > > > > More progress ? if I stick to the human genome (chr21), after the > writeback I'm able to retrieve the features via DAS/2 and visualize in > IGB. > > > > thanks again, > > Gregg > > > > -----Original Message----- > *From:* allenday at gmail.com [mailto:allenday at gmail.com] *On Behalf Of *Allen > Day > *Sent:* Monday, June 26, 2006 3:04 PM > *To:* Helt,Gregg > *Subject:* Re: DAS/2 writeback > > > > The content-type shouldn't matter. I think I was submitting with > application/x-form-encoded, or something like that. > > The CGI just reads whatever you sent it directly from STDIN though. From > the error log it looks like you are not POSTing the document. There is no > web page there, you need to issue a POST request, XML doc as the body, > directly to that CGI. > > -Allen > > On 6/26/06, *Helt,Gregg* wrote: > > That's my guess. > > > > Also, I'm attempting to post variations of your XML example to http://genomics.ctrl.ucla.edu/~allenday/cgi-bin/das2xml-parser/stable1.pl > > , > > > but so far I've only gotten "500 Internal Server Error" responses back. > Can you tell what's happening? Do I need to set the content-type for the > POST or some other header? > > > > gregg > > > > -----Original Message----- > *From:* allenday at gmail.com [mailto:allenday at gmail.com] *On Behalf Of *Allen > Day > > *Sent:* Monday, June 26, 2006 2:40 PM > *To:* Helt,Gregg > *Subject:* Re: DAS/2 writeback > > > > Ok, i will check into it. Only showing human segments, is it? > > -Allen > > On 6/26/06, *Helt,Gregg* wrote: > > But there's no entry in the segment query for chrI, which is the > chromosome you used in the the example XML you posted. So I can't find > those annotations via IGB. > > > > gregg > > > > -----Original Message----- > *From:* allenday at gmail.com [mailto:allenday at gmail.com] *On Behalf Of *Allen > Day > *Sent:* Monday, June 26, 2006 2:11 PM > *To:* Helt,Gregg > *Cc:* DAS/2 > > *Subject:* Re: DAS/2 writeback > > > > This datasource contains both yeast and human segments. I set it up this > way so features can be written for either human or yeast, then viewed at > one of: > > http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature > http://das.biopackages.net/das/genome/human/17-writeback/feature > > I thought this would be more useful so you can test viewing writeback > features alongside both "real" human and yeast features. > > So you can just ignore the irrelevant segments, or if you'd prefer I can > delete one set of segments or the other. > > -Allen > > On 6/26/06, *Helt,Gregg* wrote: > > Allen, thanks for getting the start of a writeback server up and > running! > > I'm hoping to try writing back annotations later today. However, I'm > having problems looking at the annotations in the writeback server via > IGB. It looks to me like the main issue is that > http://das.biopackages.net/das/genome/yeast/S228C-writeback/segment > returns human chromosome ids in the uri attribute of the SEGMENT > element, instead of yeast ids. When IGB uses this to compose a query > like > > http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature?over > laps=chr1/10:15000;type=SO:centromere > > it gets back an empty feature list. But if I manually edit this to > replace "chr1" with "chrI", > > http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature?over > laps=chrI/10:15000;type=SO:centromere > > I get back a list of feature that satisfies the query filters. > > Any ideas? > > Thanks, > Gregg > > > -----Original Message----- > > From: das2-bounces at lists.open-bio.org [mailto:das2-bounces at lists.open - > > bio.org] On Behalf Of Allen Day > > Sent: Saturday, June 24, 2006 2:24 AM > > To: Chervitz, Steve > > Cc: DAS/2 > > Subject: Re: [DAS2] Notes from the weekly DAS/2 teleconference, 19 Jun > > 2006 > > > > You can see the features that are posted here: > > > > http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature > > > > It is fully compatible with the usual yeast source: > > > > http://das.biopackages.net/das/genome/yeast/S228C/feature > > > > All the usual feature filters apply. The response at this URL is not > > cached > > to keep the content fresh, at the expense of ever-slower load times as > > written features accumulate. > > > > -Allen > > > > > > > > On 6/24/06, Allen Day wrote: > > > > > > I have a temporary CGI set up to accept WRITEBACK documents: > > > > > > http://genomics.ctrl.ucla.edu/~allenday/cgi-bin/das2xml- > > parser/stable1.pl< http://genomics.ctrl.ucla.edu/%7Eallenday/cgi- > > bin/das2xml-parser/stable1.pl> > > > > > > I have attached a das2xml document that POSTs cleanly for me using > the > > > lwp-request that is part of libwww-perl. Please modify this > document, > > post, > > > and let me know if anything breaks. > > > > > > This implementation accepts only new records. It supports neither > > updates > > > nor deletes. Furthermore, it only accepts new feature records. It > does > > not > > > support new type records, new region records, or any other type of > > record. > > > > > > Feature records may have 0 or more locations, 0 or more parents, 0 > or > > more > > > children, and 0 or more properties. All parts/parents must be > present > > in > > > the document (no refering to existing features by URI), or it will > throw > > a > > > HTTP 500 error. > > > > > > Next I will implement the update and delete support. This should be > > > fairly straightforward, and may be doable over the weekend. > > > > > > -Allen > > > > > > > > > > From Gregg_Helt at affymetrix.com Mon Jun 26 23:13:05 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 26 Jun 2006 20:13:05 -0700 Subject: [DAS2] DAS/2 writeback Message-ID: Things are getting stranger. I'm trying to writeback annotations on chr21, and they seem to succeed, returning me an id mapping document. But once I've sent the annotations to the server, then try to retrieve them, I can't always see them from the human source. But I can see them from the yeast source. This is easiest to see with a simple query to get all features. A query to the yeast writeback source to get all features: http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature currently returns 30 features (top-level and children), including the ones I've added on chr21. However a query to the human writeback source for all features: http://das.biopackages.net/das/genome/human/writeback/feature currently returns only 9 features (top-level and children), all on chrI. Furthermore, if I restrict my human query with a region filter: http://das.biopackages.net/das/genome/human/writeback/feature?overlaps=c hr21/0:46944323 I _do_ get back the 5 top-level "centromere" annotations I've added to chr21, and their children. But if I then add a type filter: http://das.biopackages.net/das/genome/human/writeback/feature?overlaps=c hr21/0:46944323;type=SO:centromere I only get back 1 top-level "centromere" feature and it's children. I'm not sure what it all means, but I'm hoping the results above may help diagnose the problem. Thanks, Gregg > -----Original Message----- > From: das2-bounces at lists.open-bio.org [mailto:das2-bounces at lists.open- > bio.org] On Behalf Of Helt,Gregg > Sent: Monday, June 26, 2006 3:43 PM > To: allenday at ucla.edu > Cc: DAS/2 > Subject: Re: [DAS2] DAS/2 writeback > > Thanks, that helped, the problem was on my end. Looks like I was > POSTing but not making sure the data buffer was flushed out to the > server before I tried to read the server response. I fixed that, now I > get a mapping document back. I'm not sure how much effort to put into > parsing the mapping doc though - the next update of the spec is supposed > to change so that rather than a new mapping document type, the server > responds with the full feature XML of the created/updated features. > > More progress - if I stick to the human genome (chr21), after the > writeback I'm able to retrieve the features via DAS/2 and visualize in > IGB. > > thanks again, > Gregg > From allenday at ucla.edu Tue Jun 27 02:54:52 2006 From: allenday at ucla.edu (Allen Day) Date: Mon, 26 Jun 2006 23:54:52 -0700 Subject: [DAS2] DAS/2 writeback In-Reply-To: References: Message-ID: <5c24dcc30606262354l48580fd3u24ff07a229169d8d@mail.gmail.com> Hi Gregg, Sounds like it was a bad idea for me to make a chimeric data source -- I don't want to debug bugs related to this, as it is really a misapplication of the vsource in the first place. Which would you prefer to have -- human or yeast? I will zap the segments and features for the one you don't want, and remove the vsource from das.biopackages.net -Allen On 6/26/06, Helt,Gregg wrote: > > Things are getting stranger. I'm trying to writeback annotations on > chr21, and they seem to succeed, returning me an id mapping document. > But once I've sent the annotations to the server, then try to retrieve > them, I can't always see them from the human source. But I can see them > from the yeast source. This is easiest to see with a simple query to > get all features. A query to the yeast writeback source to get all > features: > http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature > > currently returns 30 features (top-level and children), including the > ones I've added on chr21. > > However a query to the human writeback source for all features: > http://das.biopackages.net/das/genome/human/writeback/feature > > currently returns only 9 features (top-level and children), all on chrI. > > Furthermore, if I restrict my human query with a region filter: > http://das.biopackages.net/das/genome/human/writeback/feature?overlaps=c > hr21/0:46944323 > > I _do_ get back the 5 top-level "centromere" annotations I've added to > chr21, and their children. But if I then add a type filter: > http://das.biopackages.net/das/genome/human/writeback/feature?overlaps=c > hr21/0:46944323;type=SO:centromere > > I only get back 1 top-level "centromere" feature and it's children. > > I'm not sure what it all means, but I'm hoping the results above may > help diagnose the problem. > > Thanks, > Gregg > > > -----Original Message----- > > From: das2-bounces at lists.open-bio.org [mailto:das2-bounces at lists.open- > > bio.org] On Behalf Of Helt,Gregg > > Sent: Monday, June 26, 2006 3:43 PM > > To: allenday at ucla.edu > > Cc: DAS/2 > > Subject: Re: [DAS2] DAS/2 writeback > > > > Thanks, that helped, the problem was on my end. Looks like I was > > POSTing but not making sure the data buffer was flushed out to the > > server before I tried to read the server response. I fixed that, now > I > > get a mapping document back. I'm not sure how much effort to put into > > parsing the mapping doc though - the next update of the spec is > supposed > > to change so that rather than a new mapping document type, the server > > responds with the full feature XML of the created/updated features. > > > > More progress - if I stick to the human genome (chr21), after the > > writeback I'm able to retrieve the features via DAS/2 and visualize in > > IGB. > > > > thanks again, > > Gregg > > > From Gregg_Helt at affymetrix.com Tue Jun 27 11:02:50 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Tue, 27 Jun 2006 08:02:50 -0700 Subject: [DAS2] DAS/2 writeback Message-ID: I'm pretty familiar with a few annotated regions in human, which should help for testing, so I'd vote for human. thanks! gregg -----Original Message----- From: allenday at gmail.com [mailto:allenday at gmail.com] On Behalf Of Allen Day Sent: Monday, June 26, 2006 11:55 PM To: Helt,Gregg Cc: DAS/2 Subject: Re: [DAS2] DAS/2 writeback Hi Gregg, Sounds like it was a bad idea for me to make a chimeric data source -- I don't want to debug bugs related to this, as it is really a misapplication of the vsource in the first place. Which would you prefer to have -- human or yeast? I will zap the segments and features for the one you don't want, and remove the vsource from das.biopackages.net -Allen On 6/26/06, Helt,Gregg wrote: Things are getting stranger. I'm trying to writeback annotations on chr21, and they seem to succeed, returning me an id mapping document. But once I've sent the annotations to the server, then try to retrieve them, I can't always see them from the human source. But I can see them from the yeast source. This is easiest to see with a simple query to get all features. A query to the yeast writeback source to get all features: http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature currently returns 30 features (top-level and children), including the ones I've added on chr21. However a query to the human writeback source for all features: http://das.biopackages.net/das/genome/human/writeback/feature currently returns only 9 features (top-level and children), all on chrI. Furthermore, if I restrict my human query with a region filter: http://das.biopackages.net/das/genome/human/writeback/feature?overlaps=c hr21/0:46944323 I _do_ get back the 5 top-level "centromere" annotations I've added to chr21, and their children. But if I then add a type filter: http://das.biopackages.net/das/genome/human/writeback/feature?overlaps=c hr21/0:46944323;type=SO:centromere I only get back 1 top-level "centromere" feature and it's children. I'm not sure what it all means, but I'm hoping the results above may help diagnose the problem. Thanks, Gregg > -----Original Message----- > From: das2-bounces at lists.open-bio.org [mailto: das2-bounces at lists.open - > bio.org] On Behalf Of Helt,Gregg > Sent: Monday, June 26, 2006 3:43 PM > To: allenday at ucla.edu > Cc: DAS/2 > Subject: Re: [DAS2] DAS/2 writeback > > Thanks, that helped, the problem was on my end. Looks like I was > POSTing but not making sure the data buffer was flushed out to the > server before I tried to read the server response. I fixed that, now I > get a mapping document back. I'm not sure how much effort to put into > parsing the mapping doc though - the next update of the spec is supposed > to change so that rather than a new mapping document type, the server > responds with the full feature XML of the created/updated features. > > More progress - if I stick to the human genome (chr21), after the > writeback I'm able to retrieve the features via DAS/2 and visualize in > IGB. > > thanks again, > Gregg > From lstein at cshl.edu Mon Jun 5 14:31:50 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 5 Jun 2006 10:31:50 -0400 Subject: [DAS2] Example alignments In-Reply-To: <7002f99ddafd9542e8b4cb88e1712f9e@dalkescientific.com> References: <5ce0c897a901838a4a94a6331a45a79f@dalkescientific.com> <200605191150.19535.lstein@cshl.edu> <7002f99ddafd9542e8b4cb88e1712f9e@dalkescientific.com> Message-ID: <200606051031.50592.lstein@cshl.edu> Hi Andrew, I'm truly sorry at how long it has taken me to get these examples to you. I hope that the example alignments in the enclosure makes sense to you. Unfortunately I found that I had to add a new "target" attribute to in order to make the cigar string semantics unambiguous. Otherwise you wouldn't be able to tell how to interpret the gaps. Lincoln -- Lincoln Stein lstein at cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) -------------- next part -------------- CASE #1. A SIMPLE PAIRWISE ALIGNMENT. A simple alignment is one in which the alignment is represented as a single feature with no subfeatures. This is the preferred representation to be used when the entire alignment shares the same set of properties. This is an alignment between Chr3 (the reference) and EST23 (the target). Both aligned sequences are in the forward (+) direction. We represent this as a single alignment Chr4 100 CAAGACCTAAA-CTGGAATTCCAATCGCAACTCCTGGACC-TATCTATA 147 |||||||X||| ||||| ||||||| ||||X||| |||||||| EST23 1 CAAGACCAAAATCTGGA-TTCCAAT-------CCTGCACCCTATCTATA 41 This has a CIGAR gap string of M11 I1 M5 D1 M7 D7 M8 I1 M8: M11 match 11 bp I1 insert 1 gap into the reference sequence M5 match 5 bp D1 insert 1 gap into the target sequence M7 match 7 bp D7 insert 7 gaps into the target M8 match 8 bp I1 insert 1 gap into the reference M8 match 8 bp Content-Type: application/x-das-features+xml NOTE: I've had to introduce a new attribute named "target" in order to distinguish the reference sequence from the target sequence. This is necessary for the CIGAR string concepts to work. Perhaps it would be better to have a "role" attribute whose values are one of "ref" and "target?" CASE #2. A COMPLEX PAIRWISE ALIGNMENT. The complex pairwise alignment is used when the alignment is the composite of two different alignments, each of which has its own set of properties. An example of this is BLAST, in which each "BLAST hit" is composed of multiple aligned segments called "HSPs". We extend the previous example by adding another aligned segment to the alignment. BLAST hit: align Chr4:100:300 with EST23:1:58 HSP 1: Chr4 100 CAAGACCTAAA-CTGGAATTCCAATCGCAACTCCTGGACC-TATCTATA 147 |||||||X||| ||||| ||||||| ||||X||| |||||||| EST23 1 CAAGACCAAAATCTGGA-TTCCAAT-------CCTGCACCCTATCTATA 41 BLAST score = 80 CIGAR gap string M11 I1 M5 D1 M7 D7 M8 I1 M8: HSP 2: Chr4 211 TCAAACTGATAATGGGGT 228 ||||||||||| |||||| EST23 42 TCAAACTGATA-TGGGGT 58 BLAST score = 85 CIGAR gap string M11 D1 M6 We represent this as an "expressed_sequence_match" feature relating Chr4 100:300 to EST23 1:58. The feature contains two subparts, one corresponding to the HSP1 and the other corresponding to HSP2. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From Steve_Chervitz at affymetrix.com Tue Jun 6 00:54:21 2006 From: Steve_Chervitz at affymetrix.com (Chervitz, Steve) Date: Mon, 5 Jun 2006 17:54:21 -0700 Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 5 Jun 2006 Message-ID: Notes from the weekly DAS/2 teleconference, 5 Jun 2006 $Id: das2-teleconf-2006-06-05.txt,v 1.2 2006/06/06 00:52:14 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed Erwin, Gregg Helt CSHL: Lincoln Stein Dalke Scientific: Andrew Dalke UCLA: Allen Day, Brian O'connor Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Topic: status reports --------------------- gh: waiting to hear back from peter good re: grant. He thinks we have a decent chance of additional funding (bridge funding), would fund till new grant kicked in in June 2007 with suzi as a PI (revised grant). Total funding would still be less than the amount originally requested for this grant. Definitely will have funding through september this year. our grant folks beefed up the dalke consulting and cshl accounts. Will let people know re: funding past september when I find out. Impl wise, not much done in last 2 weeks. about to start testing writeback from the client side. write new features back to das/2 server (the easiest thing to test). New realease of IGB is out now with a testing curation feature. Go into preferences to turn it on. (Ed worked on this) ls: sent in example of das2 features request that returns alignments. discovered that i needed to add a new attribute to the LOC tag. have to indicate that alignments use the cigar gap string. whether you gap the ref or target sequence and indicate which one's which. there's a target attr in LOC that indicates which one is the target (a little assymetrical). gh: you can get both target and query? ls: yes. the cigar string usind d and i, you have to indicate which one is which. another thing: das/2 project for caBIG is pulling das2 into the core, has a kickoff meeting this wednesday. I will be on that meeting. we'll reiterate goals, timeline with adopters (Wistar institute) gh: it's been a while since we talked about that. is the intent to have das2 servers that can sit on top of caBIG? ls: no, das2 clients via cdBIG. we won't need it for a couple of months, hoping we'll be able to use the biopackages das2 server to serve out the data. Is this reasonable? aday: yes. ad: nothing new to report. settling in Sweden. plan to incorporate Lincoln's things into the spec. server writeback work. bo: working on hyrax client that retrieves microarray data from a das server. functional now and is now in sourceforge. http://sourceforge.net/projects/nelsonlab. uses allen's formatted output rather than netCDF. can browse ontology annotation examples. can download. focusses on individual researcher needs in Nelson lab. plan to do it as a generic plugin, data import tool. gh: for ontology stuff, any progress with suzi and chris re: how das ontology stuff will work with center for biomedical ontologies? aday: no. will touch base with her. we're continuing to operate as previously. basically just a formatting issue. [A] allen will contact with suzi re: hooking up das ontology work with NCBO bo: the document format (XML) right? gh: i think yes. to me the goal is to have NCBO adopt it aday: even if they don't we can still link to them gh: it will take encouragement from you setting that up. aday: you can load the data brian's talking about, egr format. doesn't have location gh: igb should figure it out aday: 25,000 microarrays are available at egr. ids of probe set prefixed with the platform. we have a bed formatter, so you can request in bed to. bo: need to add a pulldown for bed. netCDF is broken now, will fix it. egr is working aday: genotyping array support in igb? gh: chromosome copy number output in igb now. gtype outputs into cnat, which outputs a graph is sgr format. ready by igb. also have files with locations of snps. should be on quickload servers. near bottom entries for 10, 100, 500k arrays. nice way to visualize when zoomed way out. aday: if you load a bed file with ids, then an egr without locations. i.e., can bed files be used as identifiers for egr files? ed: yes gh: takes up more memory, but is useful. aday: working with genotyping arrays lately. will produce more files for it in the next few weeks. basically doing lots of microarray data processing now. gh: das2 writeback server? aday: xml processing code is there, not rigged up to a webserver yet. can partially translate into insert statements. gh: can it send back mapping of temp ids to final? aday: in progress gh: i can start testing creation of features now. aday: can put it as a standalone cgi script, can point it to any url. gh: the beauty of rest. [A] allen will put writeback server on public url ed: new version of igb last week (4.38). automatic reloading via jws not working for some clients. bo: can delete your cache from jws console. ed: shortcut from desktop sometimes causes problems with updates. starting to look at better loading info about colors from different types of data files. seque's into stylesheets from das. and other igb-related things. sc: installed new version of affy das2 server on the dmz. Has gregg's temporary fix for xml:base, but currently doesn't rely on it since there's no url rewriting happening. need to test it out and do same thing on production server. Also wrote script to make deploying servers easier (eg., posting new jars, re-starting server via single make command). [A] steve will test gregg's xml:base fix on dev server Topic: BOSC submission for a talk --------------------------------- ad: planning to go, waiting to determine expenses aday: will go if main conf talk is accepted. otherwise not. gh: sounds like its up to you (dalke) ad: this is what biodas is, tools, how things fit together, how rest is cool. few submissions now (ISMB and BOSC). only 4 now. usually 12 by now. ad: bod for bosc is discussing what to do gh: do you need help from any of us for bosc submission? ad: no. will send you copies to review it. gh: I gave a talk last year on das. will send it to you as a reference. sc: part of talk can be a progress since then. cause of the low turnout? ad: people waiting to see if they are accepted before registering. ls: for me it's a cost issue. 90% of people who practice bioinfo are in northern hemisphere. was low in brisbane, will be low in china (rumors of 2008 ismb in china, can't confirm). Topic: Code sprint #3 --------------------- gh: how do people feel about having another code sprint? possibly before or after CSB in august at Stanford. the last two sprints were very good. ls: I'm at csb in aug, but right after i'll be on a retreat to work on a sequencing grant. right before will be on honeymoon. gh: maybe we need to push it farther out. ad: will be in europe until 15 july. not in us until february. bo: definitely at stanford? gh: no. august seemed like a good time/location. might make more sense to have a euro-led one. sc: august is a big vaction time for europeans ad: july is for swedes. ad: there's a late breaking poster session for ismb gh: das poster? ad: need to decide on cost today if I'm going. Topic: writeback ---------------- gh: how far behind is website vs our current thinking. that's what I'm using for my impl. ad: doesn't have idea of microdeltas. other stuff is the same. ls: does it still have the mapping idea which I thought went away (local to global)? during last codesprint. gh: it did? ad: returns back the complete feature with additional attribute. so instead of a mapping, server returns back all features which changed, along with attribute: old id ---> new id gh: whether you delete things that aren't posted in feature when you submit a new post. ad: what you post is a complete replacement of what was there. gh: that verbage needs to be added. doesn't say anything about it. [A] andrew will add text to writeback spec re: new feat being a complete replacement ad: other change: complex features all need a link back to the root feature. when parsing you can build the parent-part relationship. otherwise, you do a lot more work to figure out whose in the same group. gh: seems like a hack. ls: this is not in the current writeback doc? ad: correct. additional attribute for complex features. affects reads too (not just writeback) ls: bidrectional pointers is still there correct? parent -> child, child -> feature. ad: that's still there (unlike gff: unidirectional) if you know the root, it saves you from having to traverse links, gh: doesn't add that much. may create disagreement, errors between the parent-child hierarchy. I don't think the root thing is necessary. ls: pointer to parent and the root: like a closure across it. don't see a compelling need, makies it harder to impl. gh: if its optional, will create other difficulties. ad: makes it easy to find out where the root is. ls: just go up until you find no parent. cycles would be a bug. the issue would be if during reading from remote server, gives you children first, middle layer, then root layer, will require some merging of features. depends on data structures. in perl with gbrowse, it's holding every feat or part of feat is a node in a graph. it never merges, just updates pointers. after parse finishes, finds everything without parent and recursively traverses them. gh: if you want to attach annotations as features while parsing rather than waiting till parse is done. reference counting. don't think root thing would help then. still need to figure out do I have all children. ad: when you get a failure you can throw away just the failures rather than everything. can count parents and parts as they're coming in. gh: every feature with no parent is a root. ad: yes. assuming it comes early. ls: in general case, you cannot go on and process a feature until you reached the end of the parse. because you could have multiple layers. you can say you have found any pair of layers, not everything in berween. the root ptr doesn't help either. could still be in a situation where you think you processed everything that belongs to a... ad: something comes along later "i'm still a part of that group" gh: every time you get a feature, can add it to the feature tree, can tell when you're done with group by checking pointers. ad: ok. not as useful as I thought. [A] andrew won't add root feat attribute to complex features [so the latter is actually an 'inaction' item ;-] From edgrif at sanger.ac.uk Wed Jun 7 15:35:56 2006 From: edgrif at sanger.ac.uk (Ed Griffiths) Date: Wed, 7 Jun 2006 16:35:56 +0100 (BST) Subject: [DAS2] Example alignments In-Reply-To: <200606051031.50592.lstein@cshl.edu> References: <5ce0c897a901838a4a94a6331a45a79f@dalkescientific.com> <200605191150.19535.lstein@cshl.edu> <7002f99ddafd9542e8b4cb88e1712f9e@dalkescientific.com> <200606051031.50592.lstein@cshl.edu> Message-ID: Lincoln, > I'm truly sorry at how long it has taken me to get these examples to you. I > hope that the example alignments in the enclosure makes sense to you. > > Unfortunately I found that I had to add a new "target" attribute to in > order to make the cigar string semantics unambiguous. Otherwise you wouldn't > be able to tell how to interpret the gaps. I think your idea of having a common "role" is a good one but I wondered if we could use the term "query" for the sequence that is to be aligned (i.e. the EST in your example) and "subject" for the reference sequence ? I also wondered why the hsp hits could not be nested within the overall alignment tags ?...probably that is opening a whole can of worms though.... Ed -- ------------------------------------------------------------------------ | Ed Griffiths, Acedb development, Informatics Group, | | The Morgan Building, Sanger Institute, Wellcome Trust Genome Campus | | Hinxton, Cambridge CB10 1HH | | | | email: edgrif at sanger.ac.uk Tel: +44-1223-496844 Fax: +44-1223-494919 | ------------------------------------------------------------------------ From lstein at cshl.edu Wed Jun 7 16:44:02 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 7 Jun 2006 12:44:02 -0400 Subject: [DAS2] Example alignments In-Reply-To: References: <5ce0c897a901838a4a94a6331a45a79f@dalkescientific.com> <200606051031.50592.lstein@cshl.edu> Message-ID: <200606071244.03598.lstein@cshl.edu> Query and subject are rather BLAST-specific and don't apply to other techniques, such as whole genome alignments. How about using "reference" for the reference sequence and "non-reference" for the target? Lincoln On Wednesday 07 June 2006 11:35, Ed Griffiths wrote: > Lincoln, > > > I'm truly sorry at how long it has taken me to get these examples to you. > > I hope that the example alignments in the enclosure makes sense to you. > > > > Unfortunately I found that I had to add a new "target" attribute to > > in order to make the cigar string semantics unambiguous. Otherwise you > > wouldn't be able to tell how to interpret the gaps. > > I think your idea of having a common "role" is a good one but I wondered if > we could use the term "query" for the sequence that is to be aligned (i.e. > the EST in your example) and "subject" for the reference sequence ? > > I also wondered why the hsp hits could not be nested within the overall > alignment tags ?...probably that is opening a whole can of worms though.... > > Ed -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From Steve_Chervitz at affymetrix.com Wed Jun 7 23:17:50 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Wed, 07 Jun 2006 16:17:50 -0700 Subject: [DAS2] most up-to-date mouse das? (mm7) In-Reply-To: <83722dde0605172120t5853b30al3f931bd6d73092df@mail.gmail.com> Message-ID: Ann, Did you find a solution to your problem of mapping Entrez gene ids into genomic coords? Some suggestions: 1) You can issue DAS/2 queries using gene names or accessions to retrieve coordinate info, for example: http://netaffxdas.affymetrix.com/das2/M_musculus_Aug_2005/features?name=ACTA 1 Or using refseq accession: http://netaffxdas.affymetrix.com/das2/M_musculus_Aug_2005/features?name=NM_0 09606 2) You can?t query the Affymetrix DAS/2 server using an Entrez gene id like 11459 (you could in principle, but it?s not aware of these ids at present). So you?ll need to map from Entrez gene ids into accessions using data from NCBI, such as ftp://ftp.ncbi.nih.gov/gene/gene2refseq.gz 3) There will likely be multiple mRNA sequences associated with a gene id, so you may want to look up the genomic coordinates for each mRNA and take the union of those to get a single location for each gene. Steve > From: Ann Loraine > Date: Wed, 17 May 2006 23:20:47 -0500 > To: Steve Chervitz > Cc: DAS/2 > Subject: Re: most up-to-date mouse das? (mm7) > > Hi Steve, > > Thank you very much for the info! > > Now I have another question... > > I'd like to look up the genomic coordinates of a list of mouse genes > using their numeric Entrez Gene ids. > > If it's not too much bother, do you think you'd be able to give me > some tips on how to do this using DAS? > > btw, the DAS services have been hugely helpful to me in the last week. > We have already found some interesting results with minimal coding. > And the coding was actually fun because there was NO SCREEN-SCRAPING. > Pure bliss. > > -Ann > > On 5/16/06, Steve Chervitz wrote: >> Hi Ann, >> >> The list address has changed. It's now this: das2 at lists.open-bio.org >> >> As for your question, check out the DAS registry server at the Sanger: >> >> http://das.sanger.ac.uk/registry/ >> >> I don't think the registry provides an indication of how current the >> annotations on each registered server for a given data source, such as >> Entrez Gene. It would be a good piece of data to see, though. >> >> As for the Affymetrix DAS/2 server, the mm7 annotations were last updated on >> April 19 2006: >> >> http://netaffxdas.affymetrix.com/das2/sources >> >> The available annotations come from the UCSC server, and derive from the >> knownGene, all_mrna, genscan, and refFlat files (called 'refseq' on the das >> server). Looks like the knownGene data was last updated by UCSC on 15 Dec >> 2005: >> http://hgdownload.cse.ucsc.edu/goldenPath/mm7/database/ >> >> Technical note: The xml:base attribute in the das2xml features document >> returned by the Affy DAS/2 server is currently incorrect. It should be >> >> xml:base="http://netaffxdas.affymetrix.com/das2/M_musculus_Aug_2005/features >> " >> >> instead of >> >> xml:base="http://127.0.0.1:9021/das2/M_musculus_Aug_2005/features" >> >> This will be fixed in the near future. >> >> Steve >> >>> From: Ann Loraine >>> Date: Tue, 16 May 2006 03:52:29 -0500 >>> To: Steve Chervitz >>> Subject: Fwd: most up-to-date mouse das? (mm7) >>> >>> Hi Steve, >>> >>> Would you post this to the DAS/2 list for me? >>> >>> Sorry to bother you, but for some reason my message didn't appear on the >>> list. >>> >>> -Ann >>> >>> ---------- Forwarded message ---------- >>> From: Ann Loraine >>> Date: May 15, 2006 3:35 PM >>> Subject: most up-to-date mouse das? (mm7) >>> To: Andrew Dalke , DAS/2 >>> >>> >>> >>> Hi! >>> >>> I working on a QTL study and need to get all the genes mapping to >>> various regions under peaks. >>> >>> I have the genomic coordinates for the regions so it should be very >>> simple for me to get all accessions (feature ids) underneath those >>> regions using DAS. >>> >>> My question is: what is the most up-to-date server for mm7? >>> >>> Here, of course, is UCSC: >>> >>> http://genome.cse.ucsc.edu/cgi-bin/das/mm7/features?segment=chr1:3000000,400 >>> 00 >>> 00;type=knownGene >>> >>> Ultimately, I'd like to get Entrez Gene ids for the genes under the >>> peaks so that I can start sifting through the candidates using GO. >>> >>> Any tips would be gratefully accepted! >>> >>> All the best, >>> >>> Ann >>> >>> -- >>> Ann Loraine >>> Assistant Professor >>> Section on Statistical Genetics >>> University of Alabama at Birmingham >>> http://www.ssg.uab.edu >>> http://www.transvar.org >>> >>> >>> -- >>> Ann Loraine >>> Assistant Professor >>> Section on Statistical Genetics >>> University of Alabama at Birmingham >>> http://www.ssg.uab.edu >>> http://www.transvar.org >> >> > > > -- > Ann Loraine > Assistant Professor > Section on Statistical Genetics > University of Alabama at Birmingham > http://www.ssg.uab.edu > http://www.transvar.org From aloraine at gmail.com Thu Jun 8 01:43:14 2006 From: aloraine at gmail.com (Ann Loraine) Date: Wed, 7 Jun 2006 20:43:14 -0500 Subject: [DAS2] most up-to-date mouse das? (mm7) In-Reply-To: References: <83722dde0605172120t5853b30al3f931bd6d73092df@mail.gmail.com> Message-ID: <83722dde0606071843x5f1215e0u1681ccd99a7aace4@mail.gmail.com> Thanks Steve! We ended up looking up the genomic positions in a bit of a grueling way. We had a list of gene names from a paper that we thought could influence our eQTLs and then used those to look up (by hand) the corresponding Entrez Gene ids. We used gene2refseq.gz to get RefSeq ids mapping onto the gene ids. Then we used 'bed' files downloaded from UC Santa Cruz to get the genomic coordinates of the RefSeq ids (alignments) and then checked them against our list of genomic regions (peaks). Clearly we could have used DAS to get the positions, which would have saved coding! Live and learn :-) -Ann On 6/7/06, Steve Chervitz wrote: > > Ann, > > Did you find a solution to your problem of mapping Entrez gene ids into > genomic coords? Some suggestions: > > 1) You can issue DAS/2 queries using gene names or accessions to retrieve > coordinate info, for example: > > http://netaffxdas.affymetrix.com/das2/M_musculus_Aug_2005/features?name=ACTA1 > Or using refseq accession: > http://netaffxdas.affymetrix.com/das2/M_musculus_Aug_2005/features?name=NM_009606 > > 2) You can't query the Affymetrix DAS/2 server using an Entrez gene id like > 11459 (you could in principle, but it's not aware of these ids at present). > So you'll need to map from Entrez gene ids into accessions using data from > NCBI, such as ftp://ftp.ncbi.nih.gov/gene/gene2refseq.gz > > 3) There will likely be multiple mRNA sequences associated with a gene id, > so you may want to look up the genomic coordinates for each mRNA and take > the union of those to get a single location for each gene. > > Steve > > > From: Ann Loraine > > Date: Wed, 17 May 2006 23:20:47 -0500 > > To: Steve Chervitz > > Cc: DAS/2 > > Subject: Re: most up-to-date mouse das? (mm7) > > > > Hi Steve, > > > > Thank you very much for the info! > > > > Now I have another question... > > > > I'd like to look up the genomic coordinates of a list of mouse genes > > using their numeric Entrez Gene ids. > > > > If it's not too much bother, do you think you'd be able to give me > > some tips on how to do this using DAS? > > > > btw, the DAS services have been hugely helpful to me in the last week. > > We have already found some interesting results with minimal coding. > > And the coding was actually fun because there was NO SCREEN-SCRAPING. > > Pure bliss. > > > > -Ann > > > > On 5/16/06, Steve Chervitz wrote: > >> Hi Ann, > >> > >> The list address has changed. It's now this: das2 at lists.open-bio.org > >> > >> As for your question, check out the DAS registry server at the Sanger: > >> > >> http://das.sanger.ac.uk/registry/ > >> > >> I don't think the registry provides an indication of how current the > >> annotations on each registered server for a given data source, such as > >> Entrez Gene. It would be a good piece of data to see, though. > >> > >> As for the Affymetrix DAS/2 server, the mm7 annotations were last > updated on > >> April 19 2006: > >> > >> http://netaffxdas.affymetrix.com/das2/sources > >> > >> The available annotations come from the UCSC server, and derive from the > >> knownGene, all_mrna, genscan, and refFlat files (called 'refseq' on the > das > >> server). Looks like the knownGene data was last updated by UCSC on 15 > Dec > >> 2005: > >> http://hgdownload.cse.ucsc.edu/goldenPath/mm7/database/ > >> > >> Technical note: The xml:base attribute in the das2xml features document > >> returned by the Affy DAS/2 server is currently incorrect. It should be > >> > >> > xml:base="http://netaffxdas.affymetrix.com/das2/M_musculus_Aug_2005/features > >> " > >> > >> instead of > >> > >> > xml:base="http://127.0.0.1:9021/das2/M_musculus_Aug_2005/features" > >> > >> This will be fixed in the near future. > >> > >> Steve > >> > >>> From: Ann Loraine > >>> Date: Tue, 16 May 2006 03:52:29 -0500 > >>> To: Steve Chervitz > >>> Subject: Fwd: most up-to-date mouse das? (mm7) > >>> > >>> Hi Steve, > >>> > >>> Would you post this to the DAS/2 list for me? > >>> > >>> Sorry to bother you, but for some reason my message didn't appear on > the > >>> list. > >>> > >>> -Ann > >>> > >>> ---------- Forwarded message ---------- > >>> From: Ann Loraine > >>> Date: May 15, 2006 3:35 PM > >>> Subject: most up-to-date mouse das? (mm7) > >>> To: Andrew Dalke , DAS/2 > >>> > >>> > >>> > >>> Hi! > >>> > >>> I working on a QTL study and need to get all the genes mapping to > >>> various regions under peaks. > >>> > >>> I have the genomic coordinates for the regions so it should be very > >>> simple for me to get all accessions (feature ids) underneath those > >>> regions using DAS. > >>> > >>> My question is: what is the most up-to-date server for mm7? > >>> > >>> Here, of course, is UCSC: > >>> > >>> > http://genome.cse.ucsc.edu/cgi-bin/das/mm7/features?segment=chr1:3000000,400 > >>> 00 > >>> 00;type=knownGene > >>> > >>> Ultimately, I'd like to get Entrez Gene ids for the genes under the > >>> peaks so that I can start sifting through the candidates using GO. > >>> > >>> Any tips would be gratefully accepted! > >>> > >>> All the best, > >>> > >>> Ann > >>> > >>> -- > >>> Ann Loraine > >>> Assistant Professor > >>> Section on Statistical Genetics > >>> University of Alabama at Birmingham > >>> http://www.ssg.uab.edu > >>> http://www.transvar.org > >>> > >>> > >>> -- > >>> Ann Loraine > >>> Assistant Professor > >>> Section on Statistical Genetics > >>> University of Alabama at Birmingham > >>> http://www.ssg.uab.edu > >>> http://www.transvar.org > >> > >> > > > > > > -- > > Ann Loraine > > Assistant Professor > > Section on Statistical Genetics > > University of Alabama at Birmingham > > http://www.ssg.uab.edu > > http://www.transvar.org > -- Ann Loraine Assistant Professor Section on Statistical Genetics University of Alabama at Birmingham http://www.ssg.uab.edu http://www.transvar.org From edgrif at sanger.ac.uk Thu Jun 8 08:15:17 2006 From: edgrif at sanger.ac.uk (Ed Griffiths) Date: Thu, 8 Jun 2006 09:15:17 +0100 (BST) Subject: [DAS2] Example alignments In-Reply-To: <200606071244.03598.lstein@cshl.edu> References: <5ce0c897a901838a4a94a6331a45a79f@dalkescientific.com> <200606051031.50592.lstein@cshl.edu> <200606071244.03598.lstein@cshl.edu> Message-ID: Lincoln, > Query and subject are rather BLAST-specific and don't apply to other > techniques, such as whole genome alignments. How about using "reference" for > the reference sequence and "non-reference" for the target? That seems fine to me, I think the word "target" is ambiguous as I have commonly heard people refer to both the "query" and the "subject" sequences as the "target" ! (but not at the same time of course ;-) Ed -- ------------------------------------------------------------------------ | Ed Griffiths, Acedb development, Informatics Group, | | The Morgan Building, Sanger Institute, Wellcome Trust Genome Campus | | Hinxton, Cambridge CB10 1HH | | | | email: edgrif at sanger.ac.uk Tel: +44-1223-496844 Fax: +44-1223-494919 | ------------------------------------------------------------------------ From lstein at cshl.edu Mon Jun 12 13:46:52 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 12 Jun 2006 09:46:52 -0400 Subject: [DAS2] Can't make conf call today In-Reply-To: References: <5ce0c897a901838a4a94a6331a45a79f@dalkescientific.com> <200606071244.03598.lstein@cshl.edu> Message-ID: <200606120946.53448.lstein@cshl.edu> Hi, I've got a conflict with a grant planning meeting today, so I won't be on the conference call. Next week I'll be in Melbourne for a genetics meeting and I'll miss the call as well. Sorry about that. Lincoln -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From Steve_Chervitz at affymetrix.com Mon Jun 19 19:50:16 2006 From: Steve_Chervitz at affymetrix.com (Chervitz, Steve) Date: Mon, 19 Jun 2006 12:50:16 -0700 Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 19 Jun 2006 Message-ID: Notes from the weekly DAS/2 teleconference, 19 Jun 2006 $Id: das2-teleconf-2006-06-19.txt,v 1.1 2006/06/19 19:48:57 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed Erwin, Gregg Helt UCLA: Allen Day Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. General announcements --------------------- gh: We have received additional funding from NIH extending our support through May 2007. This will provide us the support we need until the new grant would kick in (the grant renewal we're planning to submit Oct 2006). Many thanks to Peter Good who championed our cause at NIH. gh: considering moving das meeting to every two weeks, to get more participation. we used to have alternating weeks -- one week focus on spec, other week focus on implementations. [A] Gregg will broach possible biweekly das/2 meeting schedule on list. gh: Andrew is sick, so he won't be joining today. [Note: Last week only Steve, Gregg, and Ed E were on the call, so there was no major DAS/2 discussion, hence no notes were posted.] Topic: Status reports --------------------- gh: das2 writeback related work in IGB. can write back das2xml. can make curations. options to save as bed or das2xml file. can make a curation track, save as das2xml. there's an id resolution issue. roundtripping works. Next step: make sure IGB can get back a das2 document that has same xml with id mappings to different id. make sure I can swap those. should then be able to writeback to a database. ee: improved sliced view in igb, shows where deleted exons have been deleted. improved threading. slicing happens in a separate interruptable thread. gff3 reading issue on the IGB forum, our parser isn't gff3-ready. gh: deleted exons thing is cool. the gff parser is not fully gff3-compliant. [A] Ed E. will fix gff3 parsing in IGB. ee/gh: implemented a speed up for drawing, min/max. once per pixel. sc: last development was on writing scripts to automate the updating of the affy das/2 servers (dmz), so you can update the jars and re-start the server. Other das-related stuff: Contributed to email discussion thread on the W3C HCLS semantic web mailing list regarding "LSIDs in the wild", provoked by Mark Wilkinson. Looks like about half a dozen or so places that are using LSIDs in some capacity, but not a lot of resolution services out there yet. Getting different data providers to use the LSID syntax alone would be a big win. Asked Lincoln and Brian Gilman about LSIDs at hapmap and caBIG (respectively). No response yet. Also responded to Ann's question on the das/2 list about using DAS to look up genomic coords for a set of Entrez Gene ids. It would be nice to have a way to determine the types of identifiers handled by a given DAS server, so this sort of query could be handled automatically. If a DAS server could provide a list of LSID authorities and namespaces for the types of identifiers it can resolve, that could be used to provide such a look up facility. This type of information could be provided to the das/2 registry server at registration time. gh: yes, but not sure how to best deal with this information. possibly via regular expressions on feature lookup, or xid. sc: Did other work related to Netaffx update preparation and domain mapping project for exon array sequences, doing as collaboration with Melissa Cline. Using Gregg's AnnotMapper. gh: will you provide data as RDF? sc: it's still in flux, but possibly. gh: we were also going to talk about optimizing the data format for the exon array as used on the affy das server, to deal with the growing memory requirements. We can discuss this week. [A] Steve set up mtg with Gregg re: exon array data format for affy das server. aday: working on updates to the biopackages das server. gh: is it ready to handle writeback requests? aday: will be by friday. can you handle different data sources? it's in a separate db. gh: as long as it's listed in sources query. aday: it will be. From aloraine at gmail.com Tue Jun 20 14:23:20 2006 From: aloraine at gmail.com (Ann Loraine) Date: Tue, 20 Jun 2006 09:23:20 -0500 Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 19 Jun 2006 In-Reply-To: References: Message-ID: <83722dde0606200723v320ff21dnc9e73bdcacea7670@mail.gmail.com> Sorry I couldn't attend. My life has been crazy-busy lately with teaching & trying to keep the research on track. A question: Do you have any suggestions for a Web service approach for microarray expression results? We have a biggish (1700+ array hybs) database of expression data from Affymetrix ATH1 arrays. For middleware & other reasons, we are thinking of ways to provide simple CGI access to expression values in the database. The issues we are dealing with are: 1. delivering mappings of probe sets onto other ids (e.g., AGI gene ids) using different authorities: TAIR, us, Affymetrix, University of Michigan, and so on. 2. filtering out probe sets using various critiera, e.g., promiscuous probe sets that match multiple genes, probe sets that "behave badly" in all known experiments, and so on. Each filtering procedure can be given a name. 3. providing expression values generated from 'cel' files using either RMA or MAS5, w/ PMA calls on both Currently we do something very simple for the latter, e.g., http://www.transvar.org/cgi-bin/atweb/get_expr.py?psx=262002_at&psy=250641_at Values come back in tab-delimited format, not XML. The reason we are not using XML is that we want to be able to read the data directly into interactive statistical programming environments like R: > url <- 'http://www.transvar.org/cgi-bin/atweb/get_expr.py?psx=262002_at&psy=250641_at' > dat <- read.delim(url,sep='\t',header=T) > model <- lm(dat[,3]~dat[,2]) > summary(model) > plot(dat[,2],dat[,3]) > abline(model) > cor(dat[,2],dat[,3]) > hist(dat[,2]) > qqnorm(dat[,2]) and so on... R can probably handle XML somehow, but some people are confused by XML. To start, I want to avoid pushing people too far beyond their comfort zone. If you have any tips, please let me know! Right now we only have Arabidopsis data, but we are expanding to include GEO data that meet our various quality-control criteria. (You'd be shocked...maybe?...at how much bad data is in GEO!) -Ann On 6/19/06, Chervitz, Steve wrote: > Notes from the weekly DAS/2 teleconference, 19 Jun 2006 > > $Id: das2-teleconf-2006-06-19.txt,v 1.1 2006/06/19 19:48:57 sac Exp $ > > Note taker: Steve Chervitz > > Attendees: > Affy: Steve Chervitz, Ed Erwin, Gregg Helt > UCLA: Allen Day > > Action items are flagged with '[A]'. > > These notes are checked into the biodas.org CVS repository at > das/das2/notes/2006. Instructions on how to access this > repository are at http://biodas.org > > DISCLAIMER: > The note taker aims for completeness and accuracy, but these goals are > not always achievable, given the desire to get the notes out with a > rapid turnaround. So don't consider these notes as complete minutes > from the meeting, but rather abbreviated, summarized versions of what > was discussed. There may be errors of commission and omission. > Participants are welcome to post comments and/or corrections to these > as they see fit. > > General announcements > --------------------- > > gh: We have received additional funding from NIH extending our support > through May 2007. This will provide us the support we need until the > new grant would kick in (the grant renewal we're planning to submit > Oct 2006). Many thanks to Peter Good who championed our cause at NIH. > > gh: considering moving das meeting to every two weeks, to get more > participation. we used to have alternating weeks -- one week focus on > spec, other week focus on implementations. > > [A] Gregg will broach possible biweekly das/2 meeting schedule on list. > > gh: Andrew is sick, so he won't be joining today. > > [Note: Last week only Steve, Gregg, and Ed E were on the call, so there > was no major DAS/2 discussion, hence no notes were posted.] > > Topic: Status reports > --------------------- > > gh: das2 writeback related work in IGB. can write back das2xml. can > make curations. options to save as bed or das2xml file. can make a > curation track, save as das2xml. there's an id resolution > issue. roundtripping works. > > Next step: make sure IGB can get back a das2 document that has same > xml with id mappings to different id. make sure I can swap > those. should then be able to writeback to a database. > > ee: improved sliced view in igb, shows where deleted exons have been > deleted. improved threading. slicing happens in a separate > interruptable thread. gff3 reading issue on the IGB forum, our parser > isn't gff3-ready. > > gh: deleted exons thing is cool. the gff parser is not fully > gff3-compliant. > > [A] Ed E. will fix gff3 parsing in IGB. > > ee/gh: implemented a speed up for drawing, min/max. once per pixel. > > sc: last development was on writing scripts to automate the updating > of the affy das/2 servers (dmz), so you can update the jars and > re-start the server. > > Other das-related stuff: Contributed to email discussion thread on the > W3C HCLS semantic web mailing list regarding "LSIDs in the wild", > provoked by Mark Wilkinson. Looks like about half a dozen or so places > that are using LSIDs in some capacity, but not a lot of resolution > services out there yet. Getting different data providers to use the > LSID syntax alone would be a big win. Asked Lincoln and Brian Gilman > about LSIDs at hapmap and caBIG (respectively). No response yet. > > Also responded to Ann's question on the das/2 list about using DAS to > look up genomic coords for a set of Entrez Gene ids. It would be nice > to have a way to determine the types of identifiers handled by a given > DAS server, so this sort of query could be handled automatically. If a > DAS server could provide a list of LSID authorities and namespaces for > the types of identifiers it can resolve, that could be used to provide > such a look up facility. This type of information could be provided to > the das/2 registry server at registration time. > > gh: yes, but not sure how to best deal with this information. possibly > via regular expressions on feature lookup, or xid. > > sc: Did other work related to Netaffx update preparation and domain > mapping project for exon array sequences, doing as collaboration with > Melissa Cline. Using Gregg's AnnotMapper. > > gh: will you provide data as RDF? > sc: it's still in flux, but possibly. > > gh: we were also going to talk about optimizing the data format for the > exon array as used on the affy das server, to deal with the growing > memory requirements. We can discuss this week. > > [A] Steve set up mtg with Gregg re: exon array data format for affy das > server. > > aday: working on updates to the biopackages das server. > > gh: is it ready to handle writeback requests? > > aday: will be by friday. can you handle different data sources? it's > in a separate db. > gh: as long as it's listed in sources query. > aday: it will be. > > > > > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 > -- Ann Loraine Assistant Professor Section on Statistical Genetics University of Alabama at Birmingham http://www.ssg.uab.edu http://www.transvar.org From boconnor at ucla.edu Tue Jun 20 18:17:12 2006 From: boconnor at ucla.edu (Brian O'Connor) Date: Tue, 20 Jun 2006 11:17:12 -0700 Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 19 Jun 2006 In-Reply-To: <83722dde0606200723v320ff21dnc9e73bdcacea7670@mail.gmail.com> References: <83722dde0606200723v320ff21dnc9e73bdcacea7670@mail.gmail.com> Message-ID: <44983BA8.2090500@ucla.edu> Hi Ann, So there's a spec/implementation by Allen for a DAS/2 "Assay" server that would be a good jumping off point for what you want. The Nelson lab at UCLA is currently using it to server up thousands of microarray results across many different platforms. To get an idea of what's there look at the spec doc here: http://www.biodas.org/documents/das2/das2_assay.html There are some example URLs in the spec that should work (the server was down when I tried just a minute ago but should be available soon). You can retrieve expressions data using a URL similar to what you were using before: http://das.biopackages.net/das/assay/human/17/result/SN:1007162?format=mgr;protocol=rma That returns a tab-delimited file containing the RMA normalized results for this sample. The assay das server is already included in the DAS/2 rpm. The only tricky part is loading expression data into a chado instance. Allen could provide you with better guidance there than I can. Alternatively, if you have your own backend storage for the expression data you may want to write a new adapter for the DAS/2 server rather then exporting your data to another DB. --Brian Ann Loraine wrote: >Sorry I couldn't attend. My life has been crazy-busy lately with >teaching & trying to keep the research on track. > >A question: Do you have any suggestions for a Web service approach for >microarray expression results? > >We have a biggish (1700+ array hybs) database of expression data from >Affymetrix ATH1 arrays. For middleware & other reasons, we are >thinking of ways to provide simple CGI access to expression values in >the database. > >The issues we are dealing with are: > >1. delivering mappings of probe sets onto other ids (e.g., AGI gene >ids) using different authorities: TAIR, us, Affymetrix, University of >Michigan, and so on. > >2. filtering out probe sets using various critiera, e.g., promiscuous >probe sets that match multiple genes, probe sets that "behave badly" >in all known experiments, and so on. Each filtering procedure can be >given a name. > >3. providing expression values generated from 'cel' files using either >RMA or MAS5, w/ PMA calls on both > >Currently we do something very simple for the latter, e.g., > >http://www.transvar.org/cgi-bin/atweb/get_expr.py?psx=262002_at&psy=250641_at > >Values come back in tab-delimited format, not XML. The reason we are >not using XML is that we want to be able to read the data directly >into interactive statistical programming environments like R: > > > >>url <- 'http://www.transvar.org/cgi-bin/atweb/get_expr.py?psx=262002_at&psy=250641_at' >>dat <- read.delim(url,sep='\t',header=T) >>model <- lm(dat[,3]~dat[,2]) >>summary(model) >>plot(dat[,2],dat[,3]) >>abline(model) >>cor(dat[,2],dat[,3]) >>hist(dat[,2]) >>qqnorm(dat[,2]) >> >> > >and so on... > >R can probably handle XML somehow, but some people are confused by >XML. To start, I want to avoid pushing people too far beyond their >comfort zone. > >If you have any tips, please let me know! > >Right now we only have Arabidopsis data, but we are expanding to >include GEO data that meet our various quality-control criteria. >(You'd be shocked...maybe?...at how much bad data is in GEO!) > >-Ann > >On 6/19/06, Chervitz, Steve wrote: > > >>Notes from the weekly DAS/2 teleconference, 19 Jun 2006 >> >>$Id: das2-teleconf-2006-06-19.txt,v 1.1 2006/06/19 19:48:57 sac Exp $ >> >>Note taker: Steve Chervitz >> >>Attendees: >> Affy: Steve Chervitz, Ed Erwin, Gregg Helt >> UCLA: Allen Day >> >>Action items are flagged with '[A]'. >> >>These notes are checked into the biodas.org CVS repository at >>das/das2/notes/2006. Instructions on how to access this >>repository are at http://biodas.org >> >>DISCLAIMER: >>The note taker aims for completeness and accuracy, but these goals are >>not always achievable, given the desire to get the notes out with a >>rapid turnaround. So don't consider these notes as complete minutes >>from the meeting, but rather abbreviated, summarized versions of what >>was discussed. There may be errors of commission and omission. >>Participants are welcome to post comments and/or corrections to these >>as they see fit. >> >>General announcements >>--------------------- >> >>gh: We have received additional funding from NIH extending our support >>through May 2007. This will provide us the support we need until the >>new grant would kick in (the grant renewal we're planning to submit >>Oct 2006). Many thanks to Peter Good who championed our cause at NIH. >> >>gh: considering moving das meeting to every two weeks, to get more >>participation. we used to have alternating weeks -- one week focus on >>spec, other week focus on implementations. >> >>[A] Gregg will broach possible biweekly das/2 meeting schedule on list. >> >>gh: Andrew is sick, so he won't be joining today. >> >>[Note: Last week only Steve, Gregg, and Ed E were on the call, so there >>was no major DAS/2 discussion, hence no notes were posted.] >> >>Topic: Status reports >>--------------------- >> >>gh: das2 writeback related work in IGB. can write back das2xml. can >>make curations. options to save as bed or das2xml file. can make a >>curation track, save as das2xml. there's an id resolution >>issue. roundtripping works. >> >>Next step: make sure IGB can get back a das2 document that has same >>xml with id mappings to different id. make sure I can swap >>those. should then be able to writeback to a database. >> >>ee: improved sliced view in igb, shows where deleted exons have been >>deleted. improved threading. slicing happens in a separate >>interruptable thread. gff3 reading issue on the IGB forum, our parser >>isn't gff3-ready. >> >>gh: deleted exons thing is cool. the gff parser is not fully >>gff3-compliant. >> >>[A] Ed E. will fix gff3 parsing in IGB. >> >>ee/gh: implemented a speed up for drawing, min/max. once per pixel. >> >>sc: last development was on writing scripts to automate the updating >>of the affy das/2 servers (dmz), so you can update the jars and >>re-start the server. >> >>Other das-related stuff: Contributed to email discussion thread on the >>W3C HCLS semantic web mailing list regarding "LSIDs in the wild", >>provoked by Mark Wilkinson. Looks like about half a dozen or so places >>that are using LSIDs in some capacity, but not a lot of resolution >>services out there yet. Getting different data providers to use the >>LSID syntax alone would be a big win. Asked Lincoln and Brian Gilman >>about LSIDs at hapmap and caBIG (respectively). No response yet. >> >>Also responded to Ann's question on the das/2 list about using DAS to >>look up genomic coords for a set of Entrez Gene ids. It would be nice >>to have a way to determine the types of identifiers handled by a given >>DAS server, so this sort of query could be handled automatically. If a >>DAS server could provide a list of LSID authorities and namespaces for >>the types of identifiers it can resolve, that could be used to provide >>such a look up facility. This type of information could be provided to >>the das/2 registry server at registration time. >> >>gh: yes, but not sure how to best deal with this information. possibly >>via regular expressions on feature lookup, or xid. >> >>sc: Did other work related to Netaffx update preparation and domain >>mapping project for exon array sequences, doing as collaboration with >>Melissa Cline. Using Gregg's AnnotMapper. >> >>gh: will you provide data as RDF? >>sc: it's still in flux, but possibly. >> >>gh: we were also going to talk about optimizing the data format for the >>exon array as used on the affy das server, to deal with the growing >>memory requirements. We can discuss this week. >> >>[A] Steve set up mtg with Gregg re: exon array data format for affy das >>server. >> >>aday: working on updates to the biopackages das server. >> >>gh: is it ready to handle writeback requests? >> >>aday: will be by friday. can you handle different data sources? it's >>in a separate db. >>gh: as long as it's listed in sources query. >>aday: it will be. >> >> >> >> >> >>_______________________________________________ >>DAS2 mailing list >>DAS2 at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/das2 >> >> >> > > > > From allenday at ucla.edu Wed Jun 21 08:27:10 2006 From: allenday at ucla.edu (Allen Day) Date: Wed, 21 Jun 2006 01:27:10 -0700 Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 19 Jun 2006 In-Reply-To: <83722dde0606200723v320ff21dnc9e73bdcacea7670@mail.gmail.com> References: <83722dde0606200723v320ff21dnc9e73bdcacea7670@mail.gmail.com> Message-ID: <5c24dcc30606210127l7a9687fao53b40aab5db0833c@mail.gmail.com> > > 1. delivering mappings of probe sets onto other ids (e.g., AGI gene > ids) using different authorities: TAIR, us, Affymetrix, University of > Michigan, and so on. We're doing this with the NetAffx schema that has been loaded to Postgres/Chado and full-text indexed. I think we have Affy probeset -> TAIR ID mappings, but not the others. 2. filtering out probe sets using various critiera, e.g., promiscuous > probe sets that match multiple genes, probe sets that "behave badly" > in all known experiments, and so on. Each filtering procedure can be > given a name. Yes, that is something I am looking at right now. Actually, as you get more and more arrays the probeset behavior becomes very clear, with many transcripts showing discrete on/off states, e.g. a bunch of genes highly expressed in human tongue: taste receptor, type 2, member 1 http://celsius-cgi.genomics.ctrl.ucla.edu/cgi/plot_element.Rsh?221324_at gastrin-releasing peptide receptor http://celsius-cgi.genomics.ctrl.ucla.edu/cgi/plot_element.Rsh?207929_at olfactory receptor, family 10 http://celsius-cgi.genomics.ctrl.ucla.edu/cgi/plot_element.Rsh?221346_at natural cytotoxicity triggering receptor http://celsius-cgi.genomics.ctrl.ucla.edu/cgi/plot_element.Rsh?217088_s_at There are even clear trimodals, like thyroid receptor alpha: http://celsius-cgi.genomics.ctrl.ucla.edu/cgi/plot_element.Rsh?1316_at 3. providing expression values generated from 'cel' files using either > RMA or MAS5, w/ PMA calls on both Yes, you can do this in R with XML, but it's a pain. Better for expression data to use TSV as you are doing. We have an R lib in development for doing large batch retrieval of hundreds of arrays. Getting annotation into R turns out to be easier with XML as it just easier to represent in the more flexible format. -Allen From allenday at ucla.edu Wed Jun 21 08:08:34 2006 From: allenday at ucla.edu (Allen Day) Date: Wed, 21 Jun 2006 01:08:34 -0700 Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 19 Jun 2006 In-Reply-To: <44983BA8.2090500@ucla.edu> References: <83722dde0606200723v320ff21dnc9e73bdcacea7670@mail.gmail.com> <44983BA8.2090500@ucla.edu> Message-ID: <5c24dcc30606210108u3a7e9226v119c157be713e3f3@mail.gmail.com> Hi Ann, I think Brian meant to form a URL like this: http://das.biopackages.net/das/assay/celsius/1/result/SN:1007162?format=egr;protocol=rma As mentioned, we have an Affy data warehouse project going on over here. Currently in contains more than 36000 CEL files in raw and various normal flavors. 1251 of these are the ATH1-121501 platform. We typically import 300-500 arrays/week. All of GEO is already present (about 14000 CEL files), as well as several other sites' data (ArrayExpress, Broad Instittute, ...). We are currently advertising a normalization service whereby users can /anonymously/ drop off raw CEL data, and get back normalized results within a few hours, dependent on our compute cluster usage. Typically we can flip an array in about 30 minutes. We store the CEL and normalized data permanently for retrieval later, and for our own meta-analyses. At the other extreme, if you're interested in doing regular bulk import, we're also happy to set up a weekly mirror where we sync the data to our site and then process it. If you're interested in either of these, or a setup somewhere in between let me know. -Allen On 6/20/06, Brian O'Connor wrote: > > Hi Ann, > > So there's a spec/implementation by Allen for a DAS/2 "Assay" server > that would be a good jumping off point for what you want. The Nelson > lab at UCLA is currently using it to server up thousands of microarray > results across many different platforms. To get an idea of what's there > look at the spec doc here: > http://www.biodas.org/documents/das2/das2_assay.html > > There are some example URLs in the spec that should work (the server was > down when I tried just a minute ago but should be available soon). You > can retrieve expressions data using a URL similar to what you were using > before: > > > http://das.biopackages.net/das/assay/human/17/result/SN:1007162?format=mgr;protocol=rma > > That returns a tab-delimited file containing the RMA normalized results > for this sample. > > The assay das server is already included in the DAS/2 rpm. The only > tricky part is loading expression data into a chado instance. Allen > could provide you with better guidance there than I can. > Alternatively, if you have your own backend storage for the expression > data you may want to write a new adapter for the DAS/2 server rather > then exporting your data to another DB. > > --Brian > > Ann Loraine wrote: > > >Sorry I couldn't attend. My life has been crazy-busy lately with > >teaching & trying to keep the research on track. > > > >A question: Do you have any suggestions for a Web service approach for > >microarray expression results? > > > >We have a biggish (1700+ array hybs) database of expression data from > >Affymetrix ATH1 arrays. For middleware & other reasons, we are > >thinking of ways to provide simple CGI access to expression values in > >the database. > > > >The issues we are dealing with are: > > > >1. delivering mappings of probe sets onto other ids (e.g., AGI gene > >ids) using different authorities: TAIR, us, Affymetrix, University of > >Michigan, and so on. > > > >2. filtering out probe sets using various critiera, e.g., promiscuous > >probe sets that match multiple genes, probe sets that "behave badly" > >in all known experiments, and so on. Each filtering procedure can be > >given a name. > > > >3. providing expression values generated from 'cel' files using either > >RMA or MAS5, w/ PMA calls on both > > > >Currently we do something very simple for the latter, e.g., > > > > > http://www.transvar.org/cgi-bin/atweb/get_expr.py?psx=262002_at&psy=250641_at > > > >Values come back in tab-delimited format, not XML. The reason we are > >not using XML is that we want to be able to read the data directly > >into interactive statistical programming environments like R: > > > > > > > >>url <- ' > http://www.transvar.org/cgi-bin/atweb/get_expr.py?psx=262002_at&psy=250641_at > ' > >>dat <- read.delim(url,sep='\t',header=T) > >>model <- lm(dat[,3]~dat[,2]) > >>summary(model) > >>plot(dat[,2],dat[,3]) > >>abline(model) > >>cor(dat[,2],dat[,3]) > >>hist(dat[,2]) > >>qqnorm(dat[,2]) > >> > >> > > > >and so on... > > > >R can probably handle XML somehow, but some people are confused by > >XML. To start, I want to avoid pushing people too far beyond their > >comfort zone. > > > >If you have any tips, please let me know! > > > >Right now we only have Arabidopsis data, but we are expanding to > >include GEO data that meet our various quality-control criteria. > >(You'd be shocked...maybe?...at how much bad data is in GEO!) > > > >-Ann > > > >On 6/19/06, Chervitz, Steve wrote: > > > > > >>Notes from the weekly DAS/2 teleconference, 19 Jun 2006 > >> > >>$Id: das2-teleconf-2006-06-19.txt,v 1.1 2006/06/19 19:48:57 sac Exp $ > >> > >>Note taker: Steve Chervitz > >> > >>Attendees: > >> Affy: Steve Chervitz, Ed Erwin, Gregg Helt > >> UCLA: Allen Day > >> > >>Action items are flagged with '[A]'. > >> > >>These notes are checked into the biodas.org CVS repository at > >>das/das2/notes/2006. Instructions on how to access this > >>repository are at http://biodas.org > >> > >>DISCLAIMER: > >>The note taker aims for completeness and accuracy, but these goals are > >>not always achievable, given the desire to get the notes out with a > >>rapid turnaround. So don't consider these notes as complete minutes > >>from the meeting, but rather abbreviated, summarized versions of what > >>was discussed. There may be errors of commission and omission. > >>Participants are welcome to post comments and/or corrections to these > >>as they see fit. > >> > >>General announcements > >>--------------------- > >> > >>gh: We have received additional funding from NIH extending our support > >>through May 2007. This will provide us the support we need until the > >>new grant would kick in (the grant renewal we're planning to submit > >>Oct 2006). Many thanks to Peter Good who championed our cause at NIH. > >> > >>gh: considering moving das meeting to every two weeks, to get more > >>participation. we used to have alternating weeks -- one week focus on > >>spec, other week focus on implementations. > >> > >>[A] Gregg will broach possible biweekly das/2 meeting schedule on list. > >> > >>gh: Andrew is sick, so he won't be joining today. > >> > >>[Note: Last week only Steve, Gregg, and Ed E were on the call, so there > >>was no major DAS/2 discussion, hence no notes were posted.] > >> > >>Topic: Status reports > >>--------------------- > >> > >>gh: das2 writeback related work in IGB. can write back das2xml. can > >>make curations. options to save as bed or das2xml file. can make a > >>curation track, save as das2xml. there's an id resolution > >>issue. roundtripping works. > >> > >>Next step: make sure IGB can get back a das2 document that has same > >>xml with id mappings to different id. make sure I can swap > >>those. should then be able to writeback to a database. > >> > >>ee: improved sliced view in igb, shows where deleted exons have been > >>deleted. improved threading. slicing happens in a separate > >>interruptable thread. gff3 reading issue on the IGB forum, our parser > >>isn't gff3-ready. > >> > >>gh: deleted exons thing is cool. the gff parser is not fully > >>gff3-compliant. > >> > >>[A] Ed E. will fix gff3 parsing in IGB. > >> > >>ee/gh: implemented a speed up for drawing, min/max. once per pixel. > >> > >>sc: last development was on writing scripts to automate the updating > >>of the affy das/2 servers (dmz), so you can update the jars and > >>re-start the server. > >> > >>Other das-related stuff: Contributed to email discussion thread on the > >>W3C HCLS semantic web mailing list regarding "LSIDs in the wild", > >>provoked by Mark Wilkinson. Looks like about half a dozen or so places > >>that are using LSIDs in some capacity, but not a lot of resolution > >>services out there yet. Getting different data providers to use the > >>LSID syntax alone would be a big win. Asked Lincoln and Brian Gilman > >>about LSIDs at hapmap and caBIG (respectively). No response yet. > >> > >>Also responded to Ann's question on the das/2 list about using DAS to > >>look up genomic coords for a set of Entrez Gene ids. It would be nice > >>to have a way to determine the types of identifiers handled by a given > >>DAS server, so this sort of query could be handled automatically. If a > >>DAS server could provide a list of LSID authorities and namespaces for > >>the types of identifiers it can resolve, that could be used to provide > >>such a look up facility. This type of information could be provided to > >>the das/2 registry server at registration time. > >> > >>gh: yes, but not sure how to best deal with this information. possibly > >>via regular expressions on feature lookup, or xid. > >> > >>sc: Did other work related to Netaffx update preparation and domain > >>mapping project for exon array sequences, doing as collaboration with > >>Melissa Cline. Using Gregg's AnnotMapper. > >> > >>gh: will you provide data as RDF? > >>sc: it's still in flux, but possibly. > >> > >>gh: we were also going to talk about optimizing the data format for the > >>exon array as used on the affy das server, to deal with the growing > >>memory requirements. We can discuss this week. > >> > >>[A] Steve set up mtg with Gregg re: exon array data format for affy das > >>server. > >> > >>aday: working on updates to the biopackages das server. > >> > >>gh: is it ready to handle writeback requests? > >> > >>aday: will be by friday. can you handle different data sources? it's > >>in a separate db. > >>gh: as long as it's listed in sources query. > >>aday: it will be. > >> > >> > >> > >> > >> > >>_______________________________________________ > >>DAS2 mailing list > >>DAS2 at lists.open-bio.org > >>http://lists.open-bio.org/mailman/listinfo/das2 > >> > >> > >> > > > > > > > > > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 > From allenday at ucla.edu Sat Jun 24 09:24:19 2006 From: allenday at ucla.edu (Allen Day) Date: Sat, 24 Jun 2006 02:24:19 -0700 Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 19 Jun 2006 In-Reply-To: <5c24dcc30606240219x674f4b4p824bb0c979db5185@mail.gmail.com> References: <5c24dcc30606240219x674f4b4p824bb0c979db5185@mail.gmail.com> Message-ID: <5c24dcc30606240224s5a3836acyebab930c28d518ac@mail.gmail.com> You can see the features that are posted here: http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature It is fully compatible with the usual yeast source: http://das.biopackages.net/das/genome/yeast/S228C/feature All the usual feature filters apply. The response at this URL is not cached to keep the content fresh, at the expense of ever-slower load times as written features accumulate. -Allen On 6/24/06, Allen Day wrote: > > I have a temporary CGI set up to accept WRITEBACK documents: > > http://genomics.ctrl.ucla.edu/~allenday/cgi-bin/das2xml-parser/stable1.pl > > I have attached a das2xml document that POSTs cleanly for me using the > lwp-request that is part of libwww-perl. Please modify this document, post, > and let me know if anything breaks. > > This implementation accepts only new records. It supports neither updates > nor deletes. Furthermore, it only accepts new feature records. It does not > support new type records, new region records, or any other type of record. > > Feature records may have 0 or more locations, 0 or more parents, 0 or more > children, and 0 or more properties. All parts/parents must be present in > the document (no refering to existing features by URI), or it will throw a > HTTP 500 error. > > Next I will implement the update and delete support. This should be > fairly straightforward, and may be doable over the weekend. > > -Allen > > > On 6/19/06, Chervitz, Steve wrote: > > > Notes from the weekly DAS/2 teleconference, 19 Jun 2006 > > > > $Id: das2-teleconf-2006-06-19.txt,v 1.1 2006/06/19 19:48:57 sac Exp $ > > > > Note taker: Steve Chervitz > > > > Attendees: > > Affy: Steve Chervitz, Ed Erwin, Gregg Helt > > UCLA: Allen Day > > > > Action items are flagged with '[A]'. > > > > These notes are checked into the biodas.org CVS repository at > > das/das2/notes/2006. Instructions on how to access this > > repository are at http://biodas.org > > > > DISCLAIMER: > > The note taker aims for completeness and accuracy, but these goals are > > not always achievable, given the desire to get the notes out with a > > rapid turnaround. So don't consider these notes as complete minutes > > from the meeting, but rather abbreviated, summarized versions of what > > was discussed. There may be errors of commission and omission. > > Participants are welcome to post comments and/or corrections to these > > as they see fit. > > > > General announcements > > --------------------- > > > > gh: We have received additional funding from NIH extending our support > > through May 2007. This will provide us the support we need until the > > new grant would kick in (the grant renewal we're planning to submit > > Oct 2006). Many thanks to Peter Good who championed our cause at NIH. > > > > gh: considering moving das meeting to every two weeks, to get more > > participation. we used to have alternating weeks -- one week focus on > > spec, other week focus on implementations. > > > > [A] Gregg will broach possible biweekly das/2 meeting schedule on list. > > > > gh: Andrew is sick, so he won't be joining today. > > > > [Note: Last week only Steve, Gregg, and Ed E were on the call, so there > > was no major DAS/2 discussion, hence no notes were posted.] > > > > Topic: Status reports > > --------------------- > > > > gh: das2 writeback related work in IGB. can write back das2xml. can > > make curations. options to save as bed or das2xml file. can make a > > curation track, save as das2xml. there's an id resolution > > issue. roundtripping works. > > > > Next step: make sure IGB can get back a das2 document that has same > > xml with id mappings to different id. make sure I can swap > > those. should then be able to writeback to a database. > > > > ee: improved sliced view in igb, shows where deleted exons have been > > deleted. improved threading. slicing happens in a separate > > interruptable thread. gff3 reading issue on the IGB forum, our parser > > isn't gff3-ready. > > > > gh: deleted exons thing is cool. the gff parser is not fully > > gff3-compliant. > > > > [A] Ed E. will fix gff3 parsing in IGB. > > > > ee/gh: implemented a speed up for drawing, min/max. once per pixel. > > > > sc: last development was on writing scripts to automate the updating > > of the affy das/2 servers (dmz), so you can update the jars and > > re-start the server. > > > > Other das-related stuff: Contributed to email discussion thread on the > > W3C HCLS semantic web mailing list regarding "LSIDs in the wild", > > provoked by Mark Wilkinson. Looks like about half a dozen or so places > > that are using LSIDs in some capacity, but not a lot of resolution > > services out there yet. Getting different data providers to use the > > LSID syntax alone would be a big win. Asked Lincoln and Brian Gilman > > about LSIDs at hapmap and caBIG (respectively). No response yet. > > > > Also responded to Ann's question on the das/2 list about using DAS to > > look up genomic coords for a set of Entrez Gene ids. It would be nice > > to have a way to determine the types of identifiers handled by a given > > DAS server, so this sort of query could be handled automatically. If a > > DAS server could provide a list of LSID authorities and namespaces for > > the types of identifiers it can resolve, that could be used to provide > > such a look up facility. This type of information could be provided to > > the das/2 registry server at registration time. > > > > gh: yes, but not sure how to best deal with this information. possibly > > via regular expressions on feature lookup, or xid. > > > > sc: Did other work related to Netaffx update preparation and domain > > mapping project for exon array sequences, doing as collaboration with > > Melissa Cline. Using Gregg's AnnotMapper. > > > > gh: will you provide data as RDF? > > sc: it's still in flux, but possibly. > > > > gh: we were also going to talk about optimizing the data format for the > > exon array as used on the affy das server, to deal with the growing > > memory requirements. We can discuss this week. > > > > [A] Steve set up mtg with Gregg re: exon array data format for affy das > > server. > > > > aday: working on updates to the biopackages das server. > > > > gh: is it ready to handle writeback requests? > > > > aday: will be by friday. can you handle different data sources? it's > > in a separate db. > > gh: as long as it's listed in sources query. > > aday: it will be. > > > > > > > > > > > > _______________________________________________ > > DAS2 mailing list > > DAS2 at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/das2 > > > > > From allenday at ucla.edu Sat Jun 24 09:19:46 2006 From: allenday at ucla.edu (Allen Day) Date: Sat, 24 Jun 2006 02:19:46 -0700 Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 19 Jun 2006 In-Reply-To: References: Message-ID: <5c24dcc30606240219x674f4b4p824bb0c979db5185@mail.gmail.com> I have a temporary CGI set up to accept WRITEBACK documents: http://genomics.ctrl.ucla.edu/~allenday/cgi-bin/das2xml-parser/stable1.pl I have attached a das2xml document that POSTs cleanly for me using the lwp-request that is part of libwww-perl. Please modify this document, post, and let me know if anything breaks. This implementation accepts only new records. It supports neither updates nor deletes. Furthermore, it only accepts new feature records. It does not support new type records, new region records, or any other type of record. Feature records may have 0 or more locations, 0 or more parents, 0 or more children, and 0 or more properties. All parts/parents must be present in the document (no refering to existing features by URI), or it will throw a HTTP 500 error. Next I will implement the update and delete support. This should be fairly straightforward, and may be doable over the weekend. -Allen On 6/19/06, Chervitz, Steve wrote: > > Notes from the weekly DAS/2 teleconference, 19 Jun 2006 > > $Id: das2-teleconf-2006-06-19.txt,v 1.1 2006/06/19 19:48:57 sac Exp $ > > Note taker: Steve Chervitz > > Attendees: > Affy: Steve Chervitz, Ed Erwin, Gregg Helt > UCLA: Allen Day > > Action items are flagged with '[A]'. > > These notes are checked into the biodas.org CVS repository at > das/das2/notes/2006. Instructions on how to access this > repository are at http://biodas.org > > DISCLAIMER: > The note taker aims for completeness and accuracy, but these goals are > not always achievable, given the desire to get the notes out with a > rapid turnaround. So don't consider these notes as complete minutes > from the meeting, but rather abbreviated, summarized versions of what > was discussed. There may be errors of commission and omission. > Participants are welcome to post comments and/or corrections to these > as they see fit. > > General announcements > --------------------- > > gh: We have received additional funding from NIH extending our support > through May 2007. This will provide us the support we need until the > new grant would kick in (the grant renewal we're planning to submit > Oct 2006). Many thanks to Peter Good who championed our cause at NIH. > > gh: considering moving das meeting to every two weeks, to get more > participation. we used to have alternating weeks -- one week focus on > spec, other week focus on implementations. > > [A] Gregg will broach possible biweekly das/2 meeting schedule on list. > > gh: Andrew is sick, so he won't be joining today. > > [Note: Last week only Steve, Gregg, and Ed E were on the call, so there > was no major DAS/2 discussion, hence no notes were posted.] > > Topic: Status reports > --------------------- > > gh: das2 writeback related work in IGB. can write back das2xml. can > make curations. options to save as bed or das2xml file. can make a > curation track, save as das2xml. there's an id resolution > issue. roundtripping works. > > Next step: make sure IGB can get back a das2 document that has same > xml with id mappings to different id. make sure I can swap > those. should then be able to writeback to a database. > > ee: improved sliced view in igb, shows where deleted exons have been > deleted. improved threading. slicing happens in a separate > interruptable thread. gff3 reading issue on the IGB forum, our parser > isn't gff3-ready. > > gh: deleted exons thing is cool. the gff parser is not fully > gff3-compliant. > > [A] Ed E. will fix gff3 parsing in IGB. > > ee/gh: implemented a speed up for drawing, min/max. once per pixel. > > sc: last development was on writing scripts to automate the updating > of the affy das/2 servers (dmz), so you can update the jars and > re-start the server. > > Other das-related stuff: Contributed to email discussion thread on the > W3C HCLS semantic web mailing list regarding "LSIDs in the wild", > provoked by Mark Wilkinson. Looks like about half a dozen or so places > that are using LSIDs in some capacity, but not a lot of resolution > services out there yet. Getting different data providers to use the > LSID syntax alone would be a big win. Asked Lincoln and Brian Gilman > about LSIDs at hapmap and caBIG (respectively). No response yet. > > Also responded to Ann's question on the das/2 list about using DAS to > look up genomic coords for a set of Entrez Gene ids. It would be nice > to have a way to determine the types of identifiers handled by a given > DAS server, so this sort of query could be handled automatically. If a > DAS server could provide a list of LSID authorities and namespaces for > the types of identifiers it can resolve, that could be used to provide > such a look up facility. This type of information could be provided to > the das/2 registry server at registration time. > > gh: yes, but not sure how to best deal with this information. possibly > via regular expressions on feature lookup, or xid. > > sc: Did other work related to Netaffx update preparation and domain > mapping project for exon array sequences, doing as collaboration with > Melissa Cline. Using Gregg's AnnotMapper. > > gh: will you provide data as RDF? > sc: it's still in flux, but possibly. > > gh: we were also going to talk about optimizing the data format for the > exon array as used on the affy das server, to deal with the growing > memory requirements. We can discuss this week. > > [A] Steve set up mtg with Gregg re: exon array data format for affy das > server. > > aday: working on updates to the biopackages das server. > > gh: is it ready to handle writeback requests? > > aday: will be by friday. can you handle different data sources? it's > in a separate db. > gh: as long as it's listed in sources query. > aday: it will be. > > > > > > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 > -------------- next part -------------- A non-text attachment was scrubbed... Name: new.xml Type: text/xml Size: 1032 bytes Desc: not available URL: From lstein at cshl.edu Mon Jun 26 14:56:31 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 26 Jun 2006 10:56:31 -0400 Subject: [DAS2] Can't make conf call today In-Reply-To: <200606120946.53448.lstein@cshl.edu> References: <5ce0c897a901838a4a94a6331a45a79f@dalkescientific.com> <200606120946.53448.lstein@cshl.edu> Message-ID: <200606261056.32160.lstein@cshl.edu> Hi Folks, Sorry to do this three weeks in a row, but I have to teach at noon today so I'll miss the conf call. Lincoln On Monday 12 June 2006 09:46, Lincoln Stein wrote: > Hi, > > I've got a conflict with a grant planning meeting today, so I won't be on > the conference call. Next week I'll be in Melbourne for a genetics meeting > and I'll miss the call as well. > > Sorry about that. > > Lincoln -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From aloraine at gmail.com Mon Jun 26 15:59:12 2006 From: aloraine at gmail.com (Ann Loraine) Date: Mon, 26 Jun 2006 10:59:12 -0500 Subject: [DAS2] Can't make conf call today In-Reply-To: <200606261056.32160.lstein@cshl.edu> References: <5ce0c897a901838a4a94a6331a45a79f@dalkescientific.com> <200606120946.53448.lstein@cshl.edu> <200606261056.32160.lstein@cshl.edu> Message-ID: <83722dde0606260859m6da790f4peeec9ff917f948fb@mail.gmail.com> I will miss it too. My apologies. I look forward to reading Steve's summary. Best, Ann On 6/26/06, Lincoln Stein wrote: > Hi Folks, > > Sorry to do this three weeks in a row, but I have to teach at noon today so > I'll miss the conf call. > > Lincoln > > On Monday 12 June 2006 09:46, Lincoln Stein wrote: > > Hi, > > > > I've got a conflict with a grant planning meeting today, so I won't be on > > the conference call. Next week I'll be in Melbourne for a genetics meeting > > and I'll miss the call as well. > > > > Sorry about that. > > > > Lincoln > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > _______________________________________________ > DAS2 mailing list > DAS2 at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/das2 > -- Ann Loraine Assistant Professor Section on Statistical Genetics University of Alabama at Birmingham http://www.ssg.uab.edu http://www.transvar.org From Steve_Chervitz at affymetrix.com Mon Jun 26 17:58:08 2006 From: Steve_Chervitz at affymetrix.com (Steve Chervitz) Date: Mon, 26 Jun 2006 10:58:08 -0700 Subject: [DAS2] Notes from the weekly DAS/2 teleconference, 26 Jun 2006 Message-ID: Notes from the weekly DAS/2 teleconference, 26 Jun 2006 $Id: das2-teleconf-2006-06-26.txt,v 1.2 2006/06/26 17:56:11 sac Exp $ Note taker: Steve Chervitz Attendees: Affy: Steve Chervitz, Ed Erwin, Gregg Helt Dalke Scientific: Andrew Dalke Action items are flagged with '[A]'. These notes are checked into the biodas.org CVS repository at das/das2/notes/2006. Instructions on how to access this repository are at http://biodas.org DISCLAIMER: The note taker aims for completeness and accuracy, but these goals are not always achievable, given the desire to get the notes out with a rapid turnaround. So don't consider these notes as complete minutes from the meeting, but rather abbreviated, summarized versions of what was discussed. There may be errors of commission and omission. Participants are welcome to post comments and/or corrections to these as they see fit. Status reports: --------------- gh: grant update: requested 300k, may get 210-220k. will cover us at our current burn rate. based on 3mos funding left over from prev funding, so new request only needs to cover 8mos (through may 2007). still waiting to hear back. Allocated 30k for andrew's work. Originally 25k for each of two years. ad: heard back from suzi from ontology urls. not on das list. will summarize. spoke to others at NCBO about setting up a service. will ask Karen Eilbeck (genetics.utah.edu). She only has access to song.sourceforge.net domain. not good long term because it's dependent on sf. there is a general domain for the group, but not one she has access to. what do we want on the pages. xml, html? gh: meeting suzi this week. can ask more about ontology stuff. gh: we also discussed moving this das/2 meeting to biweekly. ad: makes sense. spec vs impl. sc: should the spec vs impl discussion alternate biweekly? gh: no, cover everything each meeting. [A] das/2 meeting will now be biweekly. [A] next meeting is July 10. No meeting on 3 July (US holiday). sc: is spec still frozen? ad: no, just haven't worked on it. if you want to make changes go ahead. [A] steve will fix broken in-page links on the read spec html. sc: discussed with Gregg last week about migrating affy das/1 server data to das/2. also experiencing growing pains due to more arrays to support (the affy das server is in-memory). So we strategized over a more efficient data model for the exon array data, which eats up a lot of memory (100-200 MB per array per genome version). In thinking about it more, seems too ambitious to get the more efficient data model *and* do the das/2 migration for the July Netaffx update. Another issue is that we now provide the bgn files as a separate download from Netaffx, so if we provide bp2 format, users will have to upgrade their IGB as well (this wouldn't be a concern for folks launching IGB via java web start, which should be most users). gh: bp2 format isn't too hard to do. just adds an array id field, since exon array identifiers are not integers ("1:2345678"). good plan is to first move to more efficient data model on das/1 to solve memory issues, then focus on migration to das/2. Other status of note: ---------------------- Allen day has announced the availability of his writeback server: http://lists.open-bio.org/pipermail/das2/2006-June/000744.html From Gregg_Helt at affymetrix.com Mon Jun 26 19:23:37 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 26 Jun 2006 12:23:37 -0700 Subject: [DAS2] DAS/2 writeback Message-ID: Allen, thanks for getting the start of a writeback server up and running! I'm hoping to try writing back annotations later today. However, I'm having problems looking at the annotations in the writeback server via IGB. It looks to me like the main issue is that http://das.biopackages.net/das/genome/yeast/S228C-writeback/segment returns human chromosome ids in the uri attribute of the SEGMENT element, instead of yeast ids. When IGB uses this to compose a query like http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature?over laps=chr1/10:15000;type=SO:centromere it gets back an empty feature list. But if I manually edit this to replace "chr1" with "chrI", http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature?over laps=chrI/10:15000;type=SO:centromere I get back a list of feature that satisfies the query filters. Any ideas? Thanks, Gregg > -----Original Message----- > From: das2-bounces at lists.open-bio.org [mailto:das2-bounces at lists.open- > bio.org] On Behalf Of Allen Day > Sent: Saturday, June 24, 2006 2:24 AM > To: Chervitz, Steve > Cc: DAS/2 > Subject: Re: [DAS2] Notes from the weekly DAS/2 teleconference, 19 Jun > 2006 > > You can see the features that are posted here: > > http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature > > It is fully compatible with the usual yeast source: > > http://das.biopackages.net/das/genome/yeast/S228C/feature > > All the usual feature filters apply. The response at this URL is not > cached > to keep the content fresh, at the expense of ever-slower load times as > written features accumulate. > > -Allen > > > > On 6/24/06, Allen Day wrote: > > > > I have a temporary CGI set up to accept WRITEBACK documents: > > > > http://genomics.ctrl.ucla.edu/~allenday/cgi-bin/das2xml- > parser/stable1.pl bin/das2xml-parser/stable1.pl> > > > > I have attached a das2xml document that POSTs cleanly for me using the > > lwp-request that is part of libwww-perl. Please modify this document, > post, > > and let me know if anything breaks. > > > > This implementation accepts only new records. It supports neither > updates > > nor deletes. Furthermore, it only accepts new feature records. It does > not > > support new type records, new region records, or any other type of > record. > > > > Feature records may have 0 or more locations, 0 or more parents, 0 or > more > > children, and 0 or more properties. All parts/parents must be present > in > > the document (no refering to existing features by URI), or it will throw > a > > HTTP 500 error. > > > > Next I will implement the update and delete support. This should be > > fairly straightforward, and may be doable over the weekend. > > > > -Allen > > From allenday at ucla.edu Mon Jun 26 21:11:18 2006 From: allenday at ucla.edu (Allen Day) Date: Mon, 26 Jun 2006 14:11:18 -0700 Subject: [DAS2] DAS/2 writeback In-Reply-To: References: Message-ID: <5c24dcc30606261411m49032b0fkd65fcb3022a0826d@mail.gmail.com> This datasource contains both yeast and human segments. I set it up this way so features can be written for either human or yeast, then viewed at one of: http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature http://das.biopackages.net/das/genome/human/17-writeback/feature I thought this would be more useful so you can test viewing writeback features alongside both "real" human and yeast features. So you can just ignore the irrelevant segments, or if you'd prefer I can delete one set of segments or the other. -Allen On 6/26/06, Helt,Gregg wrote: > > Allen, thanks for getting the start of a writeback server up and > running! > > I'm hoping to try writing back annotations later today. However, I'm > having problems looking at the annotations in the writeback server via > IGB. It looks to me like the main issue is that > http://das.biopackages.net/das/genome/yeast/S228C-writeback/segment > returns human chromosome ids in the uri attribute of the SEGMENT > element, instead of yeast ids. When IGB uses this to compose a query > like > > http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature?over > laps=chr1/10:15000;type=SO:centromere > > it gets back an empty feature list. But if I manually edit this to > replace "chr1" with "chrI", > > http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature?over > laps=chrI/10:15000;type=SO:centromere > > I get back a list of feature that satisfies the query filters. > > Any ideas? > > Thanks, > Gregg > > > -----Original Message----- > > From: das2-bounces at lists.open-bio.org [mailto:das2-bounces at lists.open- > > bio.org] On Behalf Of Allen Day > > Sent: Saturday, June 24, 2006 2:24 AM > > To: Chervitz, Steve > > Cc: DAS/2 > > Subject: Re: [DAS2] Notes from the weekly DAS/2 teleconference, 19 Jun > > 2006 > > > > You can see the features that are posted here: > > > > http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature > > > > It is fully compatible with the usual yeast source: > > > > http://das.biopackages.net/das/genome/yeast/S228C/feature > > > > All the usual feature filters apply. The response at this URL is not > > cached > > to keep the content fresh, at the expense of ever-slower load times as > > written features accumulate. > > > > -Allen > > > > > > > > On 6/24/06, Allen Day wrote: > > > > > > I have a temporary CGI set up to accept WRITEBACK documents: > > > > > > http://genomics.ctrl.ucla.edu/~allenday/cgi-bin/das2xml- > > parser/stable1.pl > bin/das2xml-parser/stable1.pl> > > > > > > I have attached a das2xml document that POSTs cleanly for me using > the > > > lwp-request that is part of libwww-perl. Please modify this > document, > > post, > > > and let me know if anything breaks. > > > > > > This implementation accepts only new records. It supports neither > > updates > > > nor deletes. Furthermore, it only accepts new feature records. It > does > > not > > > support new type records, new region records, or any other type of > > record. > > > > > > Feature records may have 0 or more locations, 0 or more parents, 0 > or > > more > > > children, and 0 or more properties. All parts/parents must be > present > > in > > > the document (no refering to existing features by URI), or it will > throw > > a > > > HTTP 500 error. > > > > > > Next I will implement the update and delete support. This should be > > > fairly straightforward, and may be doable over the weekend. > > > > > > -Allen > > > > From Gregg_Helt at affymetrix.com Mon Jun 26 21:20:48 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 26 Jun 2006 14:20:48 -0700 Subject: [DAS2] DAS/2 writeback Message-ID: But there's no entry in the segment query for chrI, which is the chromosome you used in the the example XML you posted. So I can't find those annotations via IGB. gregg -----Original Message----- From: allenday at gmail.com [mailto:allenday at gmail.com] On Behalf Of Allen Day Sent: Monday, June 26, 2006 2:11 PM To: Helt,Gregg Cc: DAS/2 Subject: Re: DAS/2 writeback This datasource contains both yeast and human segments. I set it up this way so features can be written for either human or yeast, then viewed at one of: http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature http://das.biopackages.net/das/genome/human/17-writeback/feature I thought this would be more useful so you can test viewing writeback features alongside both "real" human and yeast features. So you can just ignore the irrelevant segments, or if you'd prefer I can delete one set of segments or the other. -Allen On 6/26/06, Helt,Gregg wrote: Allen, thanks for getting the start of a writeback server up and running! I'm hoping to try writing back annotations later today. However, I'm having problems looking at the annotations in the writeback server via IGB. It looks to me like the main issue is that http://das.biopackages.net/das/genome/yeast/S228C-writeback/segment returns human chromosome ids in the uri attribute of the SEGMENT element, instead of yeast ids. When IGB uses this to compose a query like http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature?over laps=chr1/10:15000;type=SO:centromere it gets back an empty feature list. But if I manually edit this to replace "chr1" with "chrI", http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature?over laps=chrI/10:15000;type=SO:centromere I get back a list of feature that satisfies the query filters. Any ideas? Thanks, Gregg > -----Original Message----- > From: das2-bounces at lists.open-bio.org [mailto:das2-bounces at lists.open - > bio.org] On Behalf Of Allen Day > Sent: Saturday, June 24, 2006 2:24 AM > To: Chervitz, Steve > Cc: DAS/2 > Subject: Re: [DAS2] Notes from the weekly DAS/2 teleconference, 19 Jun > 2006 > > You can see the features that are posted here: > > http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature > > It is fully compatible with the usual yeast source: > > http://das.biopackages.net/das/genome/yeast/S228C/feature > > All the usual feature filters apply. The response at this URL is not > cached > to keep the content fresh, at the expense of ever-slower load times as > written features accumulate. > > -Allen > > > > On 6/24/06, Allen Day wrote: > > > > I have a temporary CGI set up to accept WRITEBACK documents: > > > > http://genomics.ctrl.ucla.edu/~allenday/cgi-bin/das2xml- > parser/stable1.pl< http://genomics.ctrl.ucla.edu/%7Eallenday/cgi- > bin/das2xml-parser/stable1.pl> > > > > I have attached a das2xml document that POSTs cleanly for me using the > > lwp-request that is part of libwww-perl. Please modify this document, > post, > > and let me know if anything breaks. > > > > This implementation accepts only new records. It supports neither > updates > > nor deletes. Furthermore, it only accepts new feature records. It does > not > > support new type records, new region records, or any other type of > record. > > > > Feature records may have 0 or more locations, 0 or more parents, 0 or > more > > children, and 0 or more properties. All parts/parents must be present > in > > the document (no refering to existing features by URI), or it will throw > a > > HTTP 500 error. > > > > Next I will implement the update and delete support. This should be > > fairly straightforward, and may be doable over the weekend. > > > > -Allen > > From Gregg_Helt at affymetrix.com Mon Jun 26 22:43:13 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 26 Jun 2006 15:43:13 -0700 Subject: [DAS2] DAS/2 writeback Message-ID: Thanks, that helped, the problem was on my end. Looks like I was POSTing but not making sure the data buffer was flushed out to the server before I tried to read the server response. I fixed that, now I get a mapping document back. I'm not sure how much effort to put into parsing the mapping doc though - the next update of the spec is supposed to change so that rather than a new mapping document type, the server responds with the full feature XML of the created/updated features. More progress - if I stick to the human genome (chr21), after the writeback I'm able to retrieve the features via DAS/2 and visualize in IGB. thanks again, Gregg -----Original Message----- From: allenday at gmail.com [mailto:allenday at gmail.com] On Behalf Of Allen Day Sent: Monday, June 26, 2006 3:04 PM To: Helt,Gregg Subject: Re: DAS/2 writeback The content-type shouldn't matter. I think I was submitting with application/x-form-encoded, or something like that. The CGI just reads whatever you sent it directly from STDIN though. >From the error log it looks like you are not POSTing the document. There is no web page there, you need to issue a POST request, XML doc as the body, directly to that CGI. -Allen On 6/26/06, Helt,Gregg wrote: That's my guess. Also, I'm attempting to post variations of your XML example to http://genomics.ctrl.ucla.edu/~allenday/cgi-bin/das2xml-parser/stable1.p l , but so far I've only gotten "500 Internal Server Error" responses back. Can you tell what's happening? Do I need to set the content-type for the POST or some other header? gregg -----Original Message----- From: allenday at gmail.com [mailto:allenday at gmail.com] On Behalf Of Allen Day Sent: Monday, June 26, 2006 2:40 PM To: Helt,Gregg Subject: Re: DAS/2 writeback Ok, i will check into it. Only showing human segments, is it? -Allen On 6/26/06, Helt,Gregg wrote: But there's no entry in the segment query for chrI, which is the chromosome you used in the the example XML you posted. So I can't find those annotations via IGB. gregg -----Original Message----- From: allenday at gmail.com [mailto:allenday at gmail.com] On Behalf Of Allen Day Sent: Monday, June 26, 2006 2:11 PM To: Helt,Gregg Cc: DAS/2 Subject: Re: DAS/2 writeback This datasource contains both yeast and human segments. I set it up this way so features can be written for either human or yeast, then viewed at one of: http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature http://das.biopackages.net/das/genome/human/17-writeback/feature I thought this would be more useful so you can test viewing writeback features alongside both "real" human and yeast features. So you can just ignore the irrelevant segments, or if you'd prefer I can delete one set of segments or the other. -Allen On 6/26/06, Helt,Gregg wrote: Allen, thanks for getting the start of a writeback server up and running! I'm hoping to try writing back annotations later today. However, I'm having problems looking at the annotations in the writeback server via IGB. It looks to me like the main issue is that http://das.biopackages.net/das/genome/yeast/S228C-writeback/segment returns human chromosome ids in the uri attribute of the SEGMENT element, instead of yeast ids. When IGB uses this to compose a query like http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature?over laps=chr1/10:15000;type=SO:centromere it gets back an empty feature list. But if I manually edit this to replace "chr1" with "chrI", http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature?over laps=chrI/10:15000;type=SO:centromere I get back a list of feature that satisfies the query filters. Any ideas? Thanks, Gregg > -----Original Message----- > From: das2-bounces at lists.open-bio.org [mailto:das2-bounces at lists.open - > bio.org] On Behalf Of Allen Day > Sent: Saturday, June 24, 2006 2:24 AM > To: Chervitz, Steve > Cc: DAS/2 > Subject: Re: [DAS2] Notes from the weekly DAS/2 teleconference, 19 Jun > 2006 > > You can see the features that are posted here: > > http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature > > It is fully compatible with the usual yeast source: > > http://das.biopackages.net/das/genome/yeast/S228C/feature > > All the usual feature filters apply. The response at this URL is not > cached > to keep the content fresh, at the expense of ever-slower load times as > written features accumulate. > > -Allen > > > > On 6/24/06, Allen Day wrote: > > > > I have a temporary CGI set up to accept WRITEBACK documents: > > > > http://genomics.ctrl.ucla.edu/~allenday/cgi-bin/das2xml- > parser/stable1.pl< http://genomics.ctrl.ucla.edu/%7Eallenday/cgi- > bin/das2xml-parser/stable1.pl> > > > > I have attached a das2xml document that POSTs cleanly for me using the > > lwp-request that is part of libwww-perl. Please modify this document, > post, > > and let me know if anything breaks. > > > > This implementation accepts only new records. It supports neither > updates > > nor deletes. Furthermore, it only accepts new feature records. It does > not > > support new type records, new region records, or any other type of > record. > > > > Feature records may have 0 or more locations, 0 or more parents, 0 or > more > > children, and 0 or more properties. All parts/parents must be present > in > > the document (no refering to existing features by URI), or it will throw > a > > HTTP 500 error. > > > > Next I will implement the update and delete support. This should be > > fairly straightforward, and may be doable over the weekend. > > > > -Allen > > From allenday at ucla.edu Mon Jun 26 22:51:28 2006 From: allenday at ucla.edu (Allen Day) Date: Mon, 26 Jun 2006 15:51:28 -0700 Subject: [DAS2] DAS/2 writeback In-Reply-To: References: Message-ID: <5c24dcc30606261551t7ec17d9dh9adce375f0fef470@mail.gmail.com> Great. Why the full das2xml feature doc instead of the mapping doc as response? -Allen On 6/26/06, Helt,Gregg wrote: > > Thanks, that helped, the problem was on my end. Looks like I was POSTingbut not making sure the data buffer was flushed out to the server before I > tried to read the server response. I fixed that, now I get a mapping > document back. I'm not sure how much effort to put into parsing the > mapping doc though ? the next update of the spec is supposed to change so > that rather than a new mapping document type, the server responds with the > full feature XML of the created/updated features. > > > > More progress ? if I stick to the human genome (chr21), after the > writeback I'm able to retrieve the features via DAS/2 and visualize in > IGB. > > > > thanks again, > > Gregg > > > > -----Original Message----- > *From:* allenday at gmail.com [mailto:allenday at gmail.com] *On Behalf Of *Allen > Day > *Sent:* Monday, June 26, 2006 3:04 PM > *To:* Helt,Gregg > *Subject:* Re: DAS/2 writeback > > > > The content-type shouldn't matter. I think I was submitting with > application/x-form-encoded, or something like that. > > The CGI just reads whatever you sent it directly from STDIN though. From > the error log it looks like you are not POSTing the document. There is no > web page there, you need to issue a POST request, XML doc as the body, > directly to that CGI. > > -Allen > > On 6/26/06, *Helt,Gregg* wrote: > > That's my guess. > > > > Also, I'm attempting to post variations of your XML example to http://genomics.ctrl.ucla.edu/~allenday/cgi-bin/das2xml-parser/stable1.pl > > , > > > but so far I've only gotten "500 Internal Server Error" responses back. > Can you tell what's happening? Do I need to set the content-type for the > POST or some other header? > > > > gregg > > > > -----Original Message----- > *From:* allenday at gmail.com [mailto:allenday at gmail.com] *On Behalf Of *Allen > Day > > *Sent:* Monday, June 26, 2006 2:40 PM > *To:* Helt,Gregg > *Subject:* Re: DAS/2 writeback > > > > Ok, i will check into it. Only showing human segments, is it? > > -Allen > > On 6/26/06, *Helt,Gregg* wrote: > > But there's no entry in the segment query for chrI, which is the > chromosome you used in the the example XML you posted. So I can't find > those annotations via IGB. > > > > gregg > > > > -----Original Message----- > *From:* allenday at gmail.com [mailto:allenday at gmail.com] *On Behalf Of *Allen > Day > *Sent:* Monday, June 26, 2006 2:11 PM > *To:* Helt,Gregg > *Cc:* DAS/2 > > *Subject:* Re: DAS/2 writeback > > > > This datasource contains both yeast and human segments. I set it up this > way so features can be written for either human or yeast, then viewed at > one of: > > http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature > http://das.biopackages.net/das/genome/human/17-writeback/feature > > I thought this would be more useful so you can test viewing writeback > features alongside both "real" human and yeast features. > > So you can just ignore the irrelevant segments, or if you'd prefer I can > delete one set of segments or the other. > > -Allen > > On 6/26/06, *Helt,Gregg* wrote: > > Allen, thanks for getting the start of a writeback server up and > running! > > I'm hoping to try writing back annotations later today. However, I'm > having problems looking at the annotations in the writeback server via > IGB. It looks to me like the main issue is that > http://das.biopackages.net/das/genome/yeast/S228C-writeback/segment > returns human chromosome ids in the uri attribute of the SEGMENT > element, instead of yeast ids. When IGB uses this to compose a query > like > > http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature?over > laps=chr1/10:15000;type=SO:centromere > > it gets back an empty feature list. But if I manually edit this to > replace "chr1" with "chrI", > > http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature?over > laps=chrI/10:15000;type=SO:centromere > > I get back a list of feature that satisfies the query filters. > > Any ideas? > > Thanks, > Gregg > > > -----Original Message----- > > From: das2-bounces at lists.open-bio.org [mailto:das2-bounces at lists.open - > > bio.org] On Behalf Of Allen Day > > Sent: Saturday, June 24, 2006 2:24 AM > > To: Chervitz, Steve > > Cc: DAS/2 > > Subject: Re: [DAS2] Notes from the weekly DAS/2 teleconference, 19 Jun > > 2006 > > > > You can see the features that are posted here: > > > > http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature > > > > It is fully compatible with the usual yeast source: > > > > http://das.biopackages.net/das/genome/yeast/S228C/feature > > > > All the usual feature filters apply. The response at this URL is not > > cached > > to keep the content fresh, at the expense of ever-slower load times as > > written features accumulate. > > > > -Allen > > > > > > > > On 6/24/06, Allen Day wrote: > > > > > > I have a temporary CGI set up to accept WRITEBACK documents: > > > > > > http://genomics.ctrl.ucla.edu/~allenday/cgi-bin/das2xml- > > parser/stable1.pl< http://genomics.ctrl.ucla.edu/%7Eallenday/cgi- > > bin/das2xml-parser/stable1.pl> > > > > > > I have attached a das2xml document that POSTs cleanly for me using > the > > > lwp-request that is part of libwww-perl. Please modify this > document, > > post, > > > and let me know if anything breaks. > > > > > > This implementation accepts only new records. It supports neither > > updates > > > nor deletes. Furthermore, it only accepts new feature records. It > does > > not > > > support new type records, new region records, or any other type of > > record. > > > > > > Feature records may have 0 or more locations, 0 or more parents, 0 > or > > more > > > children, and 0 or more properties. All parts/parents must be > present > > in > > > the document (no refering to existing features by URI), or it will > throw > > a > > > HTTP 500 error. > > > > > > Next I will implement the update and delete support. This should be > > > fairly straightforward, and may be doable over the weekend. > > > > > > -Allen > > > > > > > > > > From Gregg_Helt at affymetrix.com Tue Jun 27 03:13:05 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Mon, 26 Jun 2006 20:13:05 -0700 Subject: [DAS2] DAS/2 writeback Message-ID: Things are getting stranger. I'm trying to writeback annotations on chr21, and they seem to succeed, returning me an id mapping document. But once I've sent the annotations to the server, then try to retrieve them, I can't always see them from the human source. But I can see them from the yeast source. This is easiest to see with a simple query to get all features. A query to the yeast writeback source to get all features: http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature currently returns 30 features (top-level and children), including the ones I've added on chr21. However a query to the human writeback source for all features: http://das.biopackages.net/das/genome/human/writeback/feature currently returns only 9 features (top-level and children), all on chrI. Furthermore, if I restrict my human query with a region filter: http://das.biopackages.net/das/genome/human/writeback/feature?overlaps=c hr21/0:46944323 I _do_ get back the 5 top-level "centromere" annotations I've added to chr21, and their children. But if I then add a type filter: http://das.biopackages.net/das/genome/human/writeback/feature?overlaps=c hr21/0:46944323;type=SO:centromere I only get back 1 top-level "centromere" feature and it's children. I'm not sure what it all means, but I'm hoping the results above may help diagnose the problem. Thanks, Gregg > -----Original Message----- > From: das2-bounces at lists.open-bio.org [mailto:das2-bounces at lists.open- > bio.org] On Behalf Of Helt,Gregg > Sent: Monday, June 26, 2006 3:43 PM > To: allenday at ucla.edu > Cc: DAS/2 > Subject: Re: [DAS2] DAS/2 writeback > > Thanks, that helped, the problem was on my end. Looks like I was > POSTing but not making sure the data buffer was flushed out to the > server before I tried to read the server response. I fixed that, now I > get a mapping document back. I'm not sure how much effort to put into > parsing the mapping doc though - the next update of the spec is supposed > to change so that rather than a new mapping document type, the server > responds with the full feature XML of the created/updated features. > > More progress - if I stick to the human genome (chr21), after the > writeback I'm able to retrieve the features via DAS/2 and visualize in > IGB. > > thanks again, > Gregg > From allenday at ucla.edu Tue Jun 27 06:54:52 2006 From: allenday at ucla.edu (Allen Day) Date: Mon, 26 Jun 2006 23:54:52 -0700 Subject: [DAS2] DAS/2 writeback In-Reply-To: References: Message-ID: <5c24dcc30606262354l48580fd3u24ff07a229169d8d@mail.gmail.com> Hi Gregg, Sounds like it was a bad idea for me to make a chimeric data source -- I don't want to debug bugs related to this, as it is really a misapplication of the vsource in the first place. Which would you prefer to have -- human or yeast? I will zap the segments and features for the one you don't want, and remove the vsource from das.biopackages.net -Allen On 6/26/06, Helt,Gregg wrote: > > Things are getting stranger. I'm trying to writeback annotations on > chr21, and they seem to succeed, returning me an id mapping document. > But once I've sent the annotations to the server, then try to retrieve > them, I can't always see them from the human source. But I can see them > from the yeast source. This is easiest to see with a simple query to > get all features. A query to the yeast writeback source to get all > features: > http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature > > currently returns 30 features (top-level and children), including the > ones I've added on chr21. > > However a query to the human writeback source for all features: > http://das.biopackages.net/das/genome/human/writeback/feature > > currently returns only 9 features (top-level and children), all on chrI. > > Furthermore, if I restrict my human query with a region filter: > http://das.biopackages.net/das/genome/human/writeback/feature?overlaps=c > hr21/0:46944323 > > I _do_ get back the 5 top-level "centromere" annotations I've added to > chr21, and their children. But if I then add a type filter: > http://das.biopackages.net/das/genome/human/writeback/feature?overlaps=c > hr21/0:46944323;type=SO:centromere > > I only get back 1 top-level "centromere" feature and it's children. > > I'm not sure what it all means, but I'm hoping the results above may > help diagnose the problem. > > Thanks, > Gregg > > > -----Original Message----- > > From: das2-bounces at lists.open-bio.org [mailto:das2-bounces at lists.open- > > bio.org] On Behalf Of Helt,Gregg > > Sent: Monday, June 26, 2006 3:43 PM > > To: allenday at ucla.edu > > Cc: DAS/2 > > Subject: Re: [DAS2] DAS/2 writeback > > > > Thanks, that helped, the problem was on my end. Looks like I was > > POSTing but not making sure the data buffer was flushed out to the > > server before I tried to read the server response. I fixed that, now > I > > get a mapping document back. I'm not sure how much effort to put into > > parsing the mapping doc though - the next update of the spec is > supposed > > to change so that rather than a new mapping document type, the server > > responds with the full feature XML of the created/updated features. > > > > More progress - if I stick to the human genome (chr21), after the > > writeback I'm able to retrieve the features via DAS/2 and visualize in > > IGB. > > > > thanks again, > > Gregg > > > From Gregg_Helt at affymetrix.com Tue Jun 27 15:02:50 2006 From: Gregg_Helt at affymetrix.com (Helt,Gregg) Date: Tue, 27 Jun 2006 08:02:50 -0700 Subject: [DAS2] DAS/2 writeback Message-ID: I'm pretty familiar with a few annotated regions in human, which should help for testing, so I'd vote for human. thanks! gregg -----Original Message----- From: allenday at gmail.com [mailto:allenday at gmail.com] On Behalf Of Allen Day Sent: Monday, June 26, 2006 11:55 PM To: Helt,Gregg Cc: DAS/2 Subject: Re: [DAS2] DAS/2 writeback Hi Gregg, Sounds like it was a bad idea for me to make a chimeric data source -- I don't want to debug bugs related to this, as it is really a misapplication of the vsource in the first place. Which would you prefer to have -- human or yeast? I will zap the segments and features for the one you don't want, and remove the vsource from das.biopackages.net -Allen On 6/26/06, Helt,Gregg wrote: Things are getting stranger. I'm trying to writeback annotations on chr21, and they seem to succeed, returning me an id mapping document. But once I've sent the annotations to the server, then try to retrieve them, I can't always see them from the human source. But I can see them from the yeast source. This is easiest to see with a simple query to get all features. A query to the yeast writeback source to get all features: http://das.biopackages.net/das/genome/yeast/S228C-writeback/feature currently returns 30 features (top-level and children), including the ones I've added on chr21. However a query to the human writeback source for all features: http://das.biopackages.net/das/genome/human/writeback/feature currently returns only 9 features (top-level and children), all on chrI. Furthermore, if I restrict my human query with a region filter: http://das.biopackages.net/das/genome/human/writeback/feature?overlaps=c hr21/0:46944323 I _do_ get back the 5 top-level "centromere" annotations I've added to chr21, and their children. But if I then add a type filter: http://das.biopackages.net/das/genome/human/writeback/feature?overlaps=c hr21/0:46944323;type=SO:centromere I only get back 1 top-level "centromere" feature and it's children. I'm not sure what it all means, but I'm hoping the results above may help diagnose the problem. Thanks, Gregg > -----Original Message----- > From: das2-bounces at lists.open-bio.org [mailto: das2-bounces at lists.open - > bio.org] On Behalf Of Helt,Gregg > Sent: Monday, June 26, 2006 3:43 PM > To: allenday at ucla.edu > Cc: DAS/2 > Subject: Re: [DAS2] DAS/2 writeback > > Thanks, that helped, the problem was on my end. Looks like I was > POSTing but not making sure the data buffer was flushed out to the > server before I tried to read the server response. I fixed that, now I > get a mapping document back. I'm not sure how much effort to put into > parsing the mapping doc though - the next update of the spec is supposed > to change so that rather than a new mapping document type, the server > responds with the full feature XML of the created/updated features. > > More progress - if I stick to the human genome (chr21), after the > writeback I'm able to retrieve the features via DAS/2 and visualize in > IGB. > > thanks again, > Gregg >