[DAS] Minutes from 13 Nov 2008 DAS teleconference
Steve Chervitz
Steve_Chervitz at affymetrix.com
Fri Nov 14 01:16:13 UTC 2008
Here are my notes from today¹s teleconf.
I changed the syntax for action items to indicate the date on which they
originated. Should help prevent excessive slippage.
Steve
======================================
Minutes from 13 Nov 2008 DAS teleconference
Teleconference Info:
See http://www.biodas.org/wiki/BioDAS:Community_Portal#Teleconference
Attendees:
Free agent: Gregg Helt
Affymetrix: Steve Chervitz
EBI: Andy Jenkinson
Sanger: Jonathan Warren
LBNL (Suzi's Lab): Ed Lee, Leo(?), Nomi Harris
Note taker: Steve Chervitz
Action items are flagged with '[A-YYMMDD]' indicating the date they
originated.
New items arising in the discussion are flagged with '[A-new]'.
All pending action items are summarized at the bottom of the minutes.
The teleconference schedule and links to past minutes are
available from the Community Portal section of the biodas.org site:
http://www.biodas.org/wiki/BioDAS:Community_Portal#Teleconference
DISCLAIMER:
The note taker aims for completeness and accuracy, but these goals are
not always achievable. So don't consider these notes as complete minutes
from the meeting, but rather abbreviated, summarized versions of what
was discussed. There may be errors of commission and omission.
Participants are welcome to post comments and/or corrections to these
as they see fit on the the discussion list.
========================================================
Agenda:
========
* Matters arising
* Review progress on action items from last week (based on minutes)
* Discuss possible modifications to the DAS1+2 sources doc
* Writeback issues
Matters arising
=====================
GBOL Discusion [see: http://gmod.org/wiki/Gbol ]
EL: Gbol is still in dev. Requires a chado backend, flybase db.
GH: UML diags for gbol?
EL: No. For simple obj
layer is a direct mimic of chado data model. Some menu stuff for
convenience. For the biological layer haven't created any diags
yet. Based on a subset of SO.
GH: For simple obj layer I should refer to the chado object diag?
EL: Each table is an obj, with someconvenience things.
EL: A data model, lightweight versatile. Chado is complex, not very
user friendly. Gbol layer is geared toward biologist. A gene object,
not worrying about underlying structure. Plug and play
architecture. Set up factory to take care of I/O. Easy to add new
data sources. Read from chado and write to GFF3, e.g., or from two
diff data sources. One test: Everyone has diff implementation of
chado. Gbol will do on the fly translation, based on controlled
vocabularies.
GH: Primary use case is Apollo.
EL: Planned for Jbrowse (Ajax-based Gbrowse in Ian's lab)
All done on the server (I/O), not client using web services. Java
based.
GH: Looking at data model, not hard to do a DAS1/2 translation. What
is current das support?
EO: No good chado-specific das servers. Can use Gbrowse as an
intermediary. Old gbrowse is being deprecated. Gbol will act as a JSON
provider for Gbrowse, but no reason it could not act as a DAS server
too.
Discussion of action items from 30 Oct 2008 teleconf:
=====================================================
[A-081030] All: Review Gregg's DAS UML modeling, post any comments to list.
AJ: Looked at it.
GH: Let me know if you see anything problematic. Its a pretty
realistic representation.
AJ: Regarding methods: In das/1 should a method be part of a type or
an entity in itself.
GH: Das/2 combine method and type into the type. There's an optional
method in type. Types use ontological terms (not reps thereof). Das/2
type 'transcript' is not a 1:1 mapping to the SO term (e.g.,
method=Genscan, type=transcript) . So you may have
more types than SO terms.
AJ: Can do it another way. DAS/1 id for a type is the ontology
ID. Important for translation issues.
GH: Haven't done much
Where do you see it changing?
AJ: An optional element with a required attrib.
You might have an ontology to describe method. Might want to say
something is result from a type of experiment, type of algorithm, type
of sample. May not want to shoehorn them into the type.
Das has moved away from complex query capabilities. Servers don't impl
types. People tend to make a separate das source for each data
type. Moving away from queryability and towards simplicity.
GH: Complexity is then pushed into understanding different sources.
Good to
[A-new] Gregg: Work on translation of method and type in Trellis Ivy proxy.
[A-081030] All: Review Gregg's DAS1->DAS2 proxy work (Trellis/Ivy/Vine),
post any comments to list.
[A-081030] AJ: Continue checking out Gregg's DAS1->DAS2 proxy, esp. the XML.
GH: Any feedback?
JW: Had a look. Interested in locations.
GH: Translating das1 feats starts/stops into location. also
translating target starts/stop and group.
AJ: Seemed to work quite well. Problem comes when people abuse the
spec a bit.
GH: If no start/stop, = locationless feature. Phase and score are
additional complexity. If they are non-numbers it filters them out
now. If numbers, uses das1 score element
AJ: What non-numbers in score?
GH: Dash is allowed = no score available. Sometimes '*' or '.'
AJ: DAS spec sometimes uses '.' or '0', for strand
[A-081030] AJ: Post info about March '09 Hinxton DAS workshop to
biodas.org/current_events
JW: Done. Got a lot of registrations already. Aiming for 30 for
accomodations, 50 total (including campus folks). Will hit these
numbers easily.
GH: Hoping to attend.
AJ/JW: may have trouble accomodating everyone who wants to talk.
[A-new] Gregg: register for '09 Hinxton DAS workshop soon!
[A-081030] GH: Send out action and agenda items well in advance of teleconf.
Done.
[A-081030] GH: Add auth and security on the agenda so interested folks can
call in.
[A-081030] GH: Solicit feedback about security/auth from interested parties.
GH: Not added to agenda this week.
[A-081030] GH: Contribute to the DAS changes document re: DAS/2, sources &
deprecating DSN.
GH: Still pending. Hopefully next week.
[A-081030] GH: Get new teleconf number from Suzi; post to list with agenda.
GH: We are going to use Suzi's number going forward.
SC: I put this on the biodas.org wiki. Can also post the date of upcomming
teleconfs.
[A-081030] JN: Post preliminary java web start IGB release on bioviz.org
GH: Not on this call today. Next time.
[A-081030] SC: Merge DAS2 subscribers to DAS list. Redirect DAS2 posts to
DAS list.
[A-081030] SC: Consider making DAS list auto reject posts from
non-subsribers.
[A-081030] SC: Add Andy J and Jonathan W as admins to the DAS mailing list.
SC: All pending, though I did update the section of the biodas.org wiki to
indicate that the das2 list is being retired and all traffic should be
sent to the das list.
[A-081030] SC: Change 2 -> 2.1 and say it is "evolving"; declare the HTML
spec as "frozen"
[A-081030] SC: Send link to the 2.1 wiki spec to list.
SC: Done.
GH: Need to do the same for the 1.5 vs 1.6 spec.
[A-new] SC: Add AJ and JW as biodas.org sysops (can't edit side bar)
[A-new] AJ/JW: Put link to 1.6 evolving version of the 1.5 das spec on
biodas.org sidebar.
[A-081030] SC: Fix Affy IGB launching links on SF page.
SC: Have not done. Noticed today that they appear to be fixed
(probably by Ann's group -- thanks!)
[A-081030] SC: Update biodas.org community portal page with new teleconf
number.
SC: Done
[A-081016] SL: Summarize authentication pros and cons. Review descriptions,
make a decision.
EL: Was there a write up of this?
GH: People posted comments to the list: David Nix, Andy, Steven Blanchard.
Suzi is supposed to summarize.
[A-081016] SL: Decide Ian or Suzi is PI on grant. Issue reciprocal letters
of collab.
GH: Suzi's grant action item: (Feb 2009)
Feedback from funding people is that they're interested in DAS part
of it (distributed annotation). Suzi will have some feedback after
conf call on 11/14.
Topic: Writeback
==================
GH: Given LBL folks are here. How does it work in Apollo, retrieve and
edit curations?
EL: Rudimentary via das. Supports a number of data sources, load into
Apollo data model, modify, translat
Data sources: Chado, chado-xml, gff3, genbank records, some others.
GH: Thinking about for das/2 writeback: ID assignment and batch
operation. How do you do that?
EL: Id assignment is a chado (db) issue. Configurable by user (in
following format), vs database ID. In the db, at time of writeback
writing to chado instance, gets next available ID (pk), meaningless to
user, just db internal.
GH: DAS/2 writeback spec, if it's new curation, client assigns temp
id, post of xml for that feature to server, server responds back with
same xml but with temp id in 'old-uri' and new id in the 'uri' field.
EL: Similar idea. When working with db, will generate temp id, and
modify it.
GH: Related to that: changing one feature can have side effects on
other features. Change one exon boundaries, changes phase of other
exons downstream.
EL: Done via client side through Apollo. Didn't like having server do
it, since it ties to a particular db, relying on stored procs ties you
into a specific DMBS. Decided to do it on client-side. When time to
write to db, client queries db to determine available id space.
GH: Queries db before it creates a feature? JDBC?
EL: Yes and yes. Type 3 drivers.
GH: Changing in light of Gbol?
EL: Planning major rewrite of Apollo. Gbol will be able to handle
it. Apollo won't care about I/O. That's all through Gbol. For das/1->2
translation, should be efficient with our framework. Conversion
between different data sources via the data model should be easy.
GH: Regarding batch operations: easy via JDBC? Integrity across
several operations.
EL: Many DBMS don't work well across lots of transactions. Run out of
log space. Forces you to do lots of mini-transactions, with
transaction management. Can't do massive update of whole genome. We
can do per-CDS/protein/gene type edits as atomic operations.
GH: Trasactional integrity in DAS/2: a single http call is the atomic
unit. Any changes specific there are to be an atomic operation.
EL: Will be an issue with large writeback.
GH: Our model is a single human curator editing one gene at a
time. Not via a major automated pipeline script.
Not sure what happens in http when sending large amounts of data back
and forth.
EL: Problem with timeouts while client is waiting for response.
GH: Have considered an arrangement where client receives 'accepted'
(HTTP 202) and then a redirect to another source to receive the
writeback, or check status. Not in the spec now.
AJ: Has been mentioned before, "come back later" not just for
writeback. Not doing anything about it yet. Not hard to add something
like this, since most libraries support redirection. Just check the
header.
GH: Only sending data for features that change not everything (delta).
EL: ...
GH: Some of this will take trials. Getting to work with single user.
AJ: Keep it simple, add it as needed.
GH: Write back spec discussion on the mailing list (Gustavo). Can be
generalized. Very few things in there now. Think we can have the thing
that gets posted be the feature XML (DAS/1 or DAS/2). Can strip out,
simplify it. RESTful.
Have a link for this on wiki. Not yet populated.
[A-new]: Gregg write up new writeback proposal on wiki.
[A-new]: Steve - wikify the das/2 writeback here first.
AJ: Focused around proteins. Just get it working with Dasty (which
uses OpenID). Better for him to post them as DAS/1 style features.
GH: Like it because: more restful, and not just for features (applies
to seqs, types, alignments, etc.)
AJ: Use diff http commands to do different things. Post, put, get
GH: Problem for post,put,delete: you might want to do all of those in
one operation. In the general case. Something that Google data folks
are writing over posts, but are effectively doing puts and deletes
too.
AJ: Simplicity is the way to go.
GH: Reduces the number of elements.
Pending Action Items:
========================
[A-081016] SL: Decide Ian or Suzi is PI on grant. Issue reciprocal letters
of collab.
[A-081016] SL: Summarize authentication pros and cons. Review descriptions,
make a decision.
[A-081030] All: Review Gregg's DAS UML modeling, post any comments to list.
[A-081030] GH: Solicit feedback about security/auth from interested parties.
Add to agenda.
[A-081030] GH: Contribute to the DAS changes document re: DAS/2, sources &
deprecating DSN.
[A-081030] JN: Post preliminary java web start IGB release on bioviz.org
[A-081030] SC: Merge DAS2 subscribers to DAS list. Redirect DAS2 posts to
DAS list.
[A-081030] SC: Consider making DAS list auto reject posts from
non-subsribers.
[A-081030] SC: Add Andy J and Jonathan W as admins to the DAS mailing list.
[A-081113] AJ/JW: Put link to 1.6 evolving version of the 1.5 das spec on
biodas.org sidebar.
[A-081113] GH: Work on translation of method and type in Trellis Ivy proxy.
[A-081113] GH: register for '09 Hinxton DAS workshop soon!
[A-081113] GH: Write up writeback proposal ideas in the DAS/2.1 wiki.
[A-081113] SC: Add AJ and JW as biodas.org sysops (so they can edit side
bar)
[A-081113] SC: Wikify the das/2.0 writeback HTML document in das/2.1 wiki.
[A-081113] All: Next teleconf in three weeks: 04-Dec-08
[A-081113] All: Anyone that has items they want discussed, send to Gregg.
=======================================
CVS Repository version:
$Id: das2-teleconference-2008-11-13.txt,v 1.3 2008/11/14 01:14:55 sac Exp $
------------------------------------------------------------
This transmission is intended for the sole use of the individual
and entity to whom it is addressed, and may contain information
that is privileged, confidential and exempt from disclosure under
applicable law. You are hereby notified that any use,
dissemination, distribution or duplication of this transmission by
someone other than the intended addressee or its designated agent
is strictly prohibited. If you have received this transmission in
error, please notify the sender immediately by reply to this
transmission and delete it from your computer.
More information about the DAS
mailing list