[DAS2] DAS intro

Andrew Dalke dalke at dalkescientific.com
Sat Nov 26 00:35:45 UTC 2005


Hi Suzi,

You're supposed to be on holiday - it's Thanksgiving after all.

Though I'm not celebrating it until next week.  I wonder where
I can find pumpkin pie mix here ...

>> DAS/2 describes a data model for genome annotations
> ,  THAT IS, DESCRIPTIONS OF FEATURES LOCATED ON THE GENOMIC SEQUENCE

Changed, along with the other fixes.

> (DELETED LAST 2 SENTENCES).

That was the two lines about

>> Portions of
>> the assembly may have higher relative accuracy than the assembly as a
>> whole.  A reference server may supply these portions as an alternate
>> reference frame.

In the intro I want to mention all of the parts of DAS.  The
problem is that I still don't understand the /region request.
These two lines were my best attempt at explaining them.

Was the deletion because my understanding is wrong or because it's
not needed for the intro?

I think my confusion is related the concept you mention in:
>> Annotations are located on the genome with a start and end position.
>> The range may be specified mutiple times if there are alternate
>>
> SEQUENCES THEY MAY BE PLACED UPON (REFERENCE FRAMES).

because I don't understand what I should change.  I made up the
term 'reference frame' because of my physics training.  Is it
the correct term here?  Does 'reference frame' as it's normally
used only refer to the full assembly or does it refer to each
"/region" as well?  If I give the coordinates on a contig can
I say it's in the reference frame of that contig?

(Hmm, David Block agrees with me, according to
  http://open-bio.org/bosc2001/abstracts/lightning/block
    The presence of a Tiling_Path table allows the loading of
    any arbitrary length of sequence, in the reference frame
    of any of the contigs that make up the tiling path. )



I thought it was important to mention that a given annotation
may have "several <LOC> tags if the feature's location can be
represented in multiple coordinate systems (e.g. multiple builds
of a genome or multiple contigs)"

Then again, I don't understand how a given feature can be
annotated on multiple builds because I thought that a feature
was only associated with a single versioned source, and a
versioned source has only one build.


I would like to have something in the intro which mentions
"/region".  I just don't know how to do it.  Why does anyone
care about regions and not just point directly to the sequence?

>> An annotation may contain multiple non-continguous
>> parts
>
> (DELECTED PHRASE AND SENTENCE)

The deleted text there was ", making it the parent of those parts.
Some parts may have more than one parent."

I put it there because I remember we talked a lot about this
at CSHL a couple years back and wanted to make sure the data
model handled cases where, say, there were two parents to three
parts.  I seems to me that that structure is important enough
that someone who is trying to get a quick understanding of
DAS annotations would be interested in it.

My internal model for the expected reader is someone like
Allen or Gregg - people who have some experience in data
models for annotations and would like to know that DAS
can handle those sorts of more complicated tree structures.

I'm willing to move it further into the text, but I'm not
convinced that it makes things less confusing or simpler.
Features having parts and parents is an essential part of
the DAS data model.

					Andrew
					dalke at dalkescientific.com




More information about the DAS2 mailing list