[DAS] DAS and bacterial genomes
birney at ebi.ac.uk
Wed Aug 18 09:30:32 UTC 2010
On 18 Aug 2010, at 09:52, Adam Witney wrote:
> Hi Andy,
> Yes I am aware of the some of the idiosyncrasies of the Ensembl
> Genomes naming conventions. But is there a reason that the DAS
> registry should be constrained by Ensembl Genomes? Could the
> Registry entry refer to a specific taxonomy iD and its corresponding
> entry in EG, despite EG using a different taxonomy ID?
> I'd like to be able to export our microarray designs and data via
> DAS for others to use (including EnsemblBacteria). This if for 16 or
> so species with multiple strains thereof.
Just to say that I think we should get this as straight as we can;
state the obvious - EG is not trying to be deliberately complex here,
just that the concept of "one taxid == one species == one assembly
breaks down in bacteria.
I've brought in the three key people here on the EG side - Eugene
(does the web
side of this); Dan (main data production manager) and Paul Kersey (the
EG PI) -
some of them are on holiday now, but I suggest perhaps setting up a
conference (and/or Adam could you come for a visit?) to get this as
we can - I suspect there will both be short term fixes and more longer
infrastructural fixes here.
> On 17 Aug 2010, at 18:07, Andy Jenkinson wrote:
>> Hi Adam,
>> There are no coordinate systems yet as nobody has yet been brave
>> enough to start using DAS with bacteria in anger. Eugene at Ensembl
>> Genomes will have an interest in doing this, but they have issues
>> with matching up their species/strain names with the NCBI taxonomy
>> upon which DAS's coordinates are based. In essence if you will need
>> to name the coordinate systems after which they will need to be
>> added to the registry.
>> For example when Ensembl Genomes manage to do this, the coordinate
>> systems might end up looking like:
>> EB_1,Chromosome,Shigella flexneri 2a str. 301
>> EB_1,Plasmid,Shigella flexneri 2a str. 301
>> This is for a specific shigella strain with taxonomy ID 198214. The
>> authority and version parts of the DAS coordinate system are
>> somewhat arbitrarily named, ideally they would be a standard that
>> is used by the rest of the community for interoperability purposes.
>> What exactly is it you'd like to be able to do? How many species'
>> are we talking about?
>> The reason I ask is that getting these coordinate systems into the
>> DAS registry does require some work. Some of this is on the
>> registry's side, but depending where your data come from there may
>> be issues with identifying the correct coordinate system details
>> such that others can reuse them meaningfully. To use the example
>> above, Ensembl Genomes give the "301" strain a different name from
>> NCBI and use the taxonomy ID not for the strain but for the parent
>> species (Shigella flexneri). In fact the 2457T strain also uses the
>> same taxonomy ID, which isn't helpful. Given the number of
>> species', this adds up to a major headache.
>> On 17 Aug 2010, at 16:49, Adam Witney wrote:
>>> What would be the best approach to use DAS with bacterial genomes?
>>> I can't seem to find any coordinate systems for these organisms in
>>> the Registry.
>>> Thanks for any advice
>>> DAS mailing list
>>> DAS at lists.open-bio.org
> DAS mailing list
> DAS at lists.open-bio.org
More information about the DAS