[DAS2] Notes from the weekly DAS/2 teleconference, 19 Jun 2006

Ann Loraine aloraine at gmail.com
Tue Jun 20 14:23:20 UTC 2006

Sorry I couldn't attend. My life has been crazy-busy lately with
teaching & trying to keep the research on track.

A question: Do you have any suggestions for a Web service approach for
microarray expression results?

We have a biggish (1700+ array hybs) database of expression data from
Affymetrix ATH1 arrays. For middleware & other reasons, we are
thinking of ways to provide simple CGI access to expression values in
the database.

The issues we are dealing with are:

1. delivering mappings of probe sets onto other ids (e.g., AGI gene
ids) using different authorities: TAIR, us, Affymetrix, University of
Michigan, and so on.

2. filtering out probe sets using various critiera, e.g., promiscuous
probe sets that match multiple genes, probe sets that "behave badly"
in all known experiments, and so on. Each filtering procedure can be
given a name.

3. providing expression values generated from 'cel' files using either
RMA or MAS5, w/ PMA calls on both

Currently we do something very simple for the latter, e.g.,


Values come back in tab-delimited format, not XML. The reason we are
not using XML is that we want to be able to read the data directly
into interactive statistical programming environments like R:

> url <- 'http://www.transvar.org/cgi-bin/atweb/get_expr.py?psx=262002_at&psy=250641_at'
> dat <- read.delim(url,sep='\t',header=T)
> model <- lm(dat[,3]~dat[,2])
> summary(model)
> plot(dat[,2],dat[,3])
> abline(model)
> cor(dat[,2],dat[,3])
> hist(dat[,2])
> qqnorm(dat[,2])

and so on...

R can probably handle XML somehow, but some people are confused by
XML. To start, I want to avoid pushing people too far beyond their
comfort zone.

If you have any tips, please let me know!

Right now we only have Arabidopsis data, but we are expanding to
include GEO data that meet our various quality-control criteria.
(You'd be shocked...maybe?...at how much bad data is in GEO!)


On 6/19/06, Chervitz, Steve <Steve_Chervitz at affymetrix.com> wrote:
> Notes from the weekly DAS/2 teleconference, 19 Jun 2006
> $Id: das2-teleconf-2006-06-19.txt,v 1.1 2006/06/19 19:48:57 sac Exp $
> Note taker: Steve Chervitz
> Attendees:
>   Affy: Steve Chervitz, Ed Erwin, Gregg Helt
>   UCLA: Allen Day
> Action items are flagged with '[A]'.
> These notes are checked into the biodas.org CVS repository at
> das/das2/notes/2006. Instructions on how to access this
> repository are at http://biodas.org
> The note taker aims for completeness and accuracy, but these goals are
> not always achievable, given the desire to get the notes out with a
> rapid turnaround. So don't consider these notes as complete minutes
> from the meeting, but rather abbreviated, summarized versions of what
> was discussed. There may be errors of commission and omission.
> Participants are welcome to post comments and/or corrections to these
> as they see fit.
> General announcements
> ---------------------
> gh: We have received additional funding from NIH extending our support
> through May 2007. This will provide us the support we need until the
> new grant would kick in (the grant renewal we're planning to submit
> Oct 2006). Many thanks to Peter Good who championed our cause at NIH.
> gh: considering moving das meeting to every two weeks, to get more
> participation. we used to have alternating weeks -- one week focus on
> spec, other week focus on implementations.
> [A] Gregg will broach possible biweekly das/2 meeting schedule on list.
> gh: Andrew is sick, so he won't be joining today.
> [Note: Last week only Steve, Gregg, and Ed E were on the call, so there
> was no major DAS/2 discussion, hence no notes were posted.]
> Topic: Status reports
> ---------------------
> gh: das2 writeback related work in IGB. can write back das2xml. can
> make curations. options to save as bed or das2xml file. can make a
> curation track, save as das2xml. there's an id resolution
> issue. roundtripping works.
> Next step: make sure IGB can get back a das2 document that has same
> xml with id mappings to different id. make sure I can swap
> those. should then be able to writeback to a database.
> ee: improved sliced view in igb, shows where deleted exons have been
> deleted. improved threading. slicing happens in a separate
> interruptable thread. gff3 reading issue on the IGB forum, our parser
> isn't gff3-ready.
> gh: deleted exons thing is cool. the gff parser is not fully
> gff3-compliant.
> [A] Ed E. will fix gff3 parsing in IGB.
> ee/gh: implemented a speed up for drawing, min/max. once per pixel.
> sc: last development was on writing scripts to automate the updating
> of the affy das/2 servers (dmz), so you can update the jars and
> re-start the server.
> Other das-related stuff: Contributed to email discussion thread on the
> W3C HCLS semantic web mailing list regarding "LSIDs in the wild",
> provoked by Mark Wilkinson. Looks like about half a dozen or so places
> that are using LSIDs in some capacity, but not a lot of resolution
> services out there yet. Getting different data providers to use the
> LSID syntax alone would be a big win. Asked Lincoln and Brian Gilman
> about LSIDs at hapmap and caBIG (respectively). No response yet.
> Also responded to Ann's question on the das/2 list about using DAS to
> look up genomic coords for a set of Entrez Gene ids. It would be nice
> to have a way to determine the types of identifiers handled by a given
> DAS server, so this sort of query could be handled automatically. If a
> DAS server could provide a list of LSID authorities and namespaces for
> the types of identifiers it can resolve, that could be used to provide
> such a look up facility. This type of information could be provided to
> the das/2 registry server at registration time.
> gh: yes, but not sure how to best deal with this information. possibly
> via regular expressions on feature lookup, or xid.
> sc: Did other work related to Netaffx update preparation and domain
> mapping project for exon array sequences, doing as collaboration with
> Melissa Cline. Using Gregg's AnnotMapper.
> gh: will you provide data as RDF?
> sc: it's still in flux, but possibly.
> gh: we were also going to talk about optimizing the data format for the
> exon array as used on the affy das server, to deal with the growing
> memory requirements. We can discuss this week.
> [A] Steve set up mtg with Gregg re: exon array data format for affy das
> server.
> aday: working on updates to the biopackages das server.
> gh: is it ready to handle writeback requests?
> aday: will be by friday. can you handle different data sources? it's
> in a separate db.
> gh: as long as it's listed in sources query.
> aday: it will be.
> _______________________________________________
> DAS2 mailing list
> DAS2 at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das2

Ann Loraine
Assistant Professor
Section on Statistical Genetics
University of Alabama at Birmingham

More information about the DAS2 mailing list