[DAS] DAS and bacterial genomes
thomas.a.down at gmail.com
Wed Aug 18 09:35:36 UTC 2010
On Wed, Aug 18, 2010 at 10:30 AM, Ewan Birney <birney at ebi.ac.uk> wrote:
> On 18 Aug 2010, at 09:52, Adam Witney wrote:
>> Hi Andy,
>> Yes I am aware of the some of the idiosyncrasies of the Ensembl Genomes
>> naming conventions. But is there a reason that the DAS registry should be
>> constrained by Ensembl Genomes? Could the Registry entry refer to a specific
>> taxonomy iD and its corresponding entry in EG, despite EG using a different
>> taxonomy ID?
>> I'd like to be able to export our microarray designs and data via DAS for
>> others to use (including EnsemblBacteria). This if for 16 or so species with
>> multiple strains thereof.
> Just to say that I think we should get this as straight as we can; just to
> state the obvious - EG is not trying to be deliberately complex here, it is
> just that the concept of "one taxid == one species == one assembly series"
> breaks down in bacteria.
> I've brought in the three key people here on the EG side - Eugene (does the
> side of this); Dan (main data production manager) and Paul Kersey (the EG
> PI) -
> some of them are on holiday now, but I suggest perhaps setting up a phone
> conference (and/or Adam could you come for a visit?) to get this as
> straight as
> we can - I suspect there will both be short term fixes and more longer term
> infrastructural fixes here.
The DAS coordinate system scheme already handles multiple assemblies for a
given taxon fairly well (via the "authority" part). So it sounds like the
main thing that's missing is a sensible way of distinguishing between
strains. Given that the CSs are already defined by XML elements with
multiple attributes (many of them optional), would adding an (also optional,
of course) strain attribute get things 95% of the way there?
More information about the DAS