What is /region for? (was Re: [DAS2] DAS intro)
Andrew Dalke
dalke at dalkescientific.com
Wed Nov 30 01:26:29 UTC 2005
(Changed the Subject line slightly to be a bit clearer. I hope.)
On Nov 30, 2005, at 1:37 AM, Ed Erwin wrote:
> Andrew Dalke wrote:
>> My questions, to summarize, are:
>> - why do we need a /region space when we can
>> 1. point directly to a sequence (for chromosome regions) and/or
>> 2. point to a "contig" or "assembly" or "region" feature type
>> (for other regions)
>
> The way I understand it, that is what region is for: to point directly
> to a location on a sequence and/or contig.
Am I not asking the question correctly? Am I missing the
obvious? Been known to happen before!
I know what regions are. I don't know why they are in
a distinct /region subtree.
I'm happy - enthusiastic - ecstatic - that there are different
ways to identify certain regions. I fully accept that they
are in use every day and widely understood.
Why are they special enough to get their own /region subtree?
Why can't they be features?
Here's my proposal. Leaf node parts of a <feature> always point
to a /sequence and optionally point to one or more /feature
elements which are of type "region". (Or some other part of
SOFA - perhaps assembly-component?)
What to know where the feature is on a given "region" feature?
Then look up the region to find its /sequence location. Use
these two /sequence locations to get the location in the region.
Both /sequence locations are in the same "coordinate space" of
"identifier + start/end offset"
BTW, if regions are a type of features then you can search for
them. Eg, search for all top-level regions in the range 100000
to 2000000. Can't do that with the /region container. Can
if the region data is in the /feature container.
>> - When would someone have regions which have more than one of
>> contigs, ESTs and chromosomes? Especially given that this
>> is the genome spec, so chromosome-level info is known, at
>> least enough for a rough assembly.
>
> I think they do it mainly 1) when the assembly is incomplete or 2) to
> preserve annotations from the past when the assembly was incomplete.
> There could be more reasons.
>
> Here is an example of a DAS/1 server that contains both chromosomes
> and "other" short sequences as entry points:
Okay, I'm fine with that. Thanks.
Is a goal of DAS to support incomplete genomes?
Note, btw, that the /sequence subtree does not need to contain
only chromosomes. From the spec
seqid is the sequence ID, and can correspond to an assembled
chromosome, a contig, a clone, or any other accessionable
chunk of sequence.
Hence for incomplete genomes, put the sequence data as
best you can under /sequence and have the /feature subtree
point to it.
>> In other words, what are regions for?
Still don't understand the need for a /region namespace.
Repeat: I understand regions, I just don't see why they
go in their own subtree and aren't part of some other data chunk.
Please, someone sketch out some example with hand-waving
XML that shows how having a /region is the appropriate solution.
That's what I'm worried about now - the representation in XML.
Andrew
dalke at dalkescientific.com
More information about the DAS2
mailing list