[DAS2] Notes from the weekly DAS/2 teleconference, 19 Jun 2006

Wed Jun 21 08:08:34 UTC 2006

Hi Ann,

I think Brian meant to form a URL like this:

http://das.biopackages.net/das/assay/celsius/1/result/SN:1007162?format=egr;protocol=rma

As mentioned, we have an Affy data warehouse project going on over here.
Currently in contains more than 36000 CEL files in raw and various normal
flavors.  1251 of these are the ATH1-121501 platform.  We typically import
300-500 arrays/week.  All of GEO is already present (about 14000 CEL files),
as well as several other sites' data (ArrayExpress, Broad Instittute, ...).

We are currently advertising a normalization service whereby users can
/anonymously/ drop off raw CEL data, and get back normalized results within
a few hours, dependent on our compute cluster usage.  Typically we can flip
an array in about 30 minutes.  We store the CEL and normalized data
permanently for retrieval later, and for our own meta-analyses.

At the other extreme, if you're interested in doing regular bulk import,
we're also happy to set up a weekly mirror where we sync the data to our
site and then process it.

If you're interested in either of these, or a setup somewhere in between let
me know.

-Allen

On 6/20/06, Brian O'Connor <boconnor at ucla.edu> wrote:
>
> Hi Ann,
>
> So there's a spec/implementation by Allen for a DAS/2 "Assay" server
> that would be a good jumping off point for what you want.  The Nelson
> lab at UCLA is currently using it to server up thousands of microarray
> results across many different platforms.  To get an idea of what's there
> look at the spec doc here:
> http://www.biodas.org/documents/das2/das2_assay.html
>
> There are some example URLs in the spec that should work (the server was
> down when I tried just a minute ago but should be available soon).  You
> can retrieve expressions data using a URL similar to what you were using
> before:
>
>
> http://das.biopackages.net/das/assay/human/17/result/SN:1007162?format=mgr;protocol=rma
>
> That returns a tab-delimited file containing the RMA normalized results
> for this sample.
>
> The assay das server is already included in the DAS/2 rpm.  The only
> tricky part is loading expression data into a chado instance.  Allen
> could provide you with better guidance there than I can.
> Alternatively, if you have your own backend storage for the expression
> data you may want to write a new adapter for the DAS/2 server rather
> then exporting your data to another DB.
>
> --Brian
>
> Ann Loraine wrote:
>
> >Sorry I couldn't attend. My life has been crazy-busy lately with
> >teaching & trying to keep the research on track.
> >
> >A question: Do you have any suggestions for a Web service approach for
> >microarray expression results?
> >
> >We have a biggish (1700+ array hybs) database of expression data from
> >Affymetrix ATH1 arrays. For middleware & other reasons, we are
> >thinking of ways to provide simple CGI access to expression values in
> >the database.
> >
> >The issues we are dealing with are:
> >
> >1. delivering mappings of probe sets onto other ids (e.g., AGI gene
> >ids) using different authorities: TAIR, us, Affymetrix, University of
> >Michigan, and so on.
> >
> >2. filtering out probe sets using various critiera, e.g., promiscuous
> >probe sets that match multiple genes, probe sets that "behave badly"
> >in all known experiments, and so on. Each filtering procedure can be
> >given a name.
> >
> >3. providing expression values generated from 'cel' files using either
> >RMA or MAS5, w/ PMA calls on both
> >
> >Currently we do something very simple for the latter, e.g.,
> >
> >
> http://www.transvar.org/cgi-bin/atweb/get_expr.py?psx=262002_at&psy=250641_at
> >
> >Values come back in tab-delimited format, not XML. The reason we are
> >not using XML is that we want to be able to read the data directly
> >into interactive statistical programming environments like R:
> >
> >
> >
> >>url <- '
> http://www.transvar.org/cgi-bin/atweb/get_expr.py?psx=262002_at&psy=250641_at
> '
> >>dat <- read.delim(url,sep='\t',header=T)
> >>model <- lm(dat[,3]~dat[,2])
> >>summary(model)
> >>plot(dat[,2],dat[,3])
> >>abline(model)
> >>cor(dat[,2],dat[,3])
> >>hist(dat[,2])
> >>qqnorm(dat[,2])
> >>
> >>
> >
> >and so on...
> >
> >R can probably handle XML somehow, but some people are confused by
> >XML. To start, I want to avoid pushing people too far beyond their
> >comfort zone.
> >
> >If you have any tips, please let me know!
> >
> >Right now we only have Arabidopsis data, but we are expanding to
> >include GEO data that meet our various quality-control criteria.
> >(You'd be shocked...maybe?...at how much bad data is in GEO!)
> >
> >-Ann
> >
> >On 6/19/06, Chervitz, Steve <Steve_Chervitz at affymetrix.com> wrote:
> >
> >
> >>Notes from the weekly DAS/2 teleconference, 19 Jun 2006
> >>
> >>$Id: das2-teleconf-2006-06-19.txt,v 1.1 2006/06/19 19:48:57 sac Exp $
> >>
> >>Note taker: Steve Chervitz
> >>
> >>Attendees:
> >>  Affy: Steve Chervitz, Ed Erwin, Gregg Helt
> >>  UCLA: Allen Day
> >>
> >>Action items are flagged with '[A]'.
> >>
> >>These notes are checked into the biodas.org CVS repository at
> >>das/das2/notes/2006. Instructions on how to access this
> >>repository are at http://biodas.org
> >>
> >>DISCLAIMER:
> >>The note taker aims for completeness and accuracy, but these goals are
> >>not always achievable, given the desire to get the notes out with a
> >>rapid turnaround. So don't consider these notes as complete minutes
> >>from the meeting, but rather abbreviated, summarized versions of what
> >>was discussed. There may be errors of commission and omission.
> >>Participants are welcome to post comments and/or corrections to these
> >>as they see fit.
> >>
> >>General announcements
> >>---------------------
> >>
> >>gh: We have received additional funding from NIH extending our support
> >>through May 2007. This will provide us the support we need until the
> >>new grant would kick in (the grant renewal we're planning to submit
> >>Oct 2006). Many thanks to Peter Good who championed our cause at NIH.
> >>
> >>gh: considering moving das meeting to every two weeks, to get more
> >>participation. we used to have alternating weeks -- one week focus on
> >>spec, other week focus on implementations.
> >>
> >>[A] Gregg will broach possible biweekly das/2 meeting schedule on list.
> >>
> >>gh: Andrew is sick, so he won't be joining today.
> >>
> >>[Note: Last week only Steve, Gregg, and Ed E were on the call, so there
> >>was no major DAS/2 discussion, hence no notes were posted.]
> >>
> >>Topic: Status reports
> >>---------------------
> >>
> >>gh: das2 writeback related work in IGB. can write back das2xml. can
> >>make curations. options to save as bed or das2xml file. can make a
> >>curation track, save as das2xml. there's an id resolution
> >>issue. roundtripping works.
> >>
> >>Next step: make sure IGB can get back a das2 document that has same
> >>xml with id mappings to different id. make sure I can swap
> >>those. should then be able to writeback to a database.
> >>
> >>ee: improved sliced view in igb, shows where deleted exons have been
> >>deleted. improved threading. slicing happens in a separate
> >>interruptable thread. gff3 reading issue on the IGB forum, our parser
> >>isn't gff3-ready.
> >>
> >>gh: deleted exons thing is cool. the gff parser is not fully
> >>gff3-compliant.
> >>
> >>[A] Ed E. will fix gff3 parsing in IGB.
> >>
> >>ee/gh: implemented a speed up for drawing, min/max. once per pixel.
> >>
> >>sc: last development was on writing scripts to automate the updating
> >>of the affy das/2 servers (dmz), so you can update the jars and
> >>re-start the server.
> >>
> >>Other das-related stuff: Contributed to email discussion thread on the
> >>W3C HCLS semantic web mailing list regarding "LSIDs in the wild",
> >>provoked by Mark Wilkinson. Looks like about half a dozen or so places
> >>that are using LSIDs in some capacity, but not a lot of resolution
> >>services out there yet. Getting different data providers to use the
> >>LSID syntax alone would be a big win. Asked Lincoln and Brian Gilman
> >>about LSIDs at hapmap and caBIG (respectively). No response yet.
> >>
> >>Also responded to Ann's question on the das/2 list about using DAS to
> >>look up genomic coords for a set of Entrez Gene ids. It would be nice
> >>to have a way to determine the types of identifiers handled by a given
> >>DAS server, so this sort of query could be handled automatically. If a
> >>DAS server could provide a list of LSID authorities and namespaces for
> >>the types of identifiers it can resolve, that could be used to provide
> >>such a look up facility. This type of information could be provided to
> >>the das/2 registry server at registration time.
> >>
> >>gh: yes, but not sure how to best deal with this information. possibly
> >>via regular expressions on feature lookup, or xid.
> >>
> >>sc: Did other work related to Netaffx update preparation and domain
> >>mapping project for exon array sequences, doing as collaboration with
> >>Melissa Cline. Using Gregg's AnnotMapper.
> >>
> >>gh: will you provide data as RDF?
> >>sc: it's still in flux, but possibly.
> >>
> >>gh: we were also going to talk about optimizing the data format for the
> >>exon array as used on the affy das server, to deal with the growing
> >>memory requirements. We can discuss this week.
> >>
> >>[A] Steve set up mtg with Gregg re: exon array data format for affy das
> >>server.
> >>
> >>aday: working on updates to the biopackages das server.
> >>
> >>gh: is it ready to handle writeback requests?
> >>
> >>aday: will be by friday. can you handle different data sources? it's
> >>in a separate db.
> >>gh: as long as it's listed in sources query.
> >>aday: it will be.
> >>
> >>
> >>
> >>
> >>
> >>_______________________________________________
> >>DAS2 mailing list
> >>DAS2 at lists.open-bio.org
> >>http://lists.open-bio.org/mailman/listinfo/das2
> >>
> >>
> >>
> >
> >
> >
> >
>
> _______________________________________________
> DAS2 mailing list
> DAS2 at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das2
>