[DAS] Re: Trying to Slurp up DAS Annotations from Ensembl

Lincoln Stein lstein@cshl.org
Sun, 23 Jun 2002 10:38:25 -0400


Hi Angie,

Thank you very much for the patch.  It went right in once I figured out to 
remove the carriage returns!

It sounds like you're doing coordinate discovery in the way that it was 
intended, but then again DAS has the worldview that coordinate discovery 
would only be performed on relatively small regions of the genome at a time.  
There is an obvious need for a mechanism to retrieve large assemblies in a 
single document, such as an AGP file.  Thomas Down's XFF format (an 
experimental extension to DAS) allows this, and possibly the Sanger server 
will return assemblies in this format.  I'd be interested in how well this 
works for you.  Tony?

The coordinate mapper I was talking about is the simple one that, given an 
assembly, generates a list of coordinates in all the subparts, and, 
conversely, resolves subpart coordinates into superpart coordinates.  This is 
being done in various application layers at the moment, and needs to be moved 
into the Perl library layer.  Since you're a C shop, I won't be expecting you 
to work on it, and in fact it looks like a new module in BioPerl will fit the 
bill.

Lincoln

On Friday 21 June 2002 08:17 pm, you wrote:
> Hi Lincoln,
>
> OK, attached is a small diff -C 3 patch file that adds size and
> subparts to Segment.pm.  size is an attribute of segment in
> entry_points (from some servers, not all).  subparts is an attribute
> of segment in entry_points, or of type in features queries.
>
> I have been trying to descend servlet.sanger's mapmaster
> coordinate trees of segments with subparts, but my program has
> not yet been able to run all the way through; servlet has
> been returning truncated XML or sometimes no XML, and Tony
> asked me to back off on the many-segment queries.  I backed
> off to 3 segments per features query, but have still been
> causing quite a few server errors.
>
> So my impression of coordinate discovery, so far, is that it's so
> painful for the server that I'm only getting to use discovered
> coords on an annotation server today, and only because I discovered
> coords in a mysql dump file on Ensembl's ftp site.  ;)  Maybe I have
> the wrong approach (entry_points, then features queries on anything
> that has subparts)?
>
> Also, FYI for the time being I've switched to C, using an XML
> parser generated from DAS dtd files by Jim's autoXml tool.
> So Bio::Das is not my primary base at the moment... this is a C shop.
> But I would like to know what interface you have in mind for the
> "missing module" -- is it something that takes two mapmasters, learns
> their coordinate systems, and supplies transforms between the two?
> Or would it operate at a smaller level (e.g. a contig at a time
> instead of a whole data source)?
>
> Thanks,
>
> Angie
>
> On Mon, 17 Jun 2002, Lincoln Stein wrote:
> > Hi Angie,
> >
> > I'd love the patch.  Also, if you have any interest in working on the
> > missing module that interconverts coordinates based on the assembly
> > information, please let me know!
> >
> > Lincoln
> >
> > On Friday 14 June 2002 02:04 pm, Angie Hinrichs wrote:
> > > Hi Lincoln,
> > >
> > > Thanks!  I've found Bio::Das quite useful already in exploring
> > > mapmaster coordinate systems.  I added subparts() and size() methods
> > > to Segment.pm;  I need to test them out some more, but would you be
> > > interested in a patch at some point?
> > >
> > > Very glad to meet you,
> > >
> > > Angie

-- 
========================================================================
Lincoln D. Stein                           Cold Spring Harbor Laboratory
lstein@cshl.org			                  Cold Spring Harbor, NY
========================================================================