[DAS] DAS and bacterial genomes

Ewan Birney birney at ebi.ac.uk
Wed Aug 18 09:30:32 UTC 2010

On 18 Aug 2010, at 09:52, Adam Witney wrote:

> Hi Andy,
> Yes I am aware of the some of the idiosyncrasies of the Ensembl  
> Genomes naming conventions. But is there a reason that the DAS  
> registry should be constrained by Ensembl Genomes? Could the  
> Registry entry refer to a specific taxonomy iD and its corresponding  
> entry in EG, despite EG using a different taxonomy ID?
> I'd like to be able to export our microarray designs and data via  
> DAS for others to use (including EnsemblBacteria). This if for 16 or  
> so species with multiple strains thereof.

Just to say that I think we should get this as straight as we can;  
just to
state the obvious - EG is not trying to be deliberately complex here,  
it is
just that the concept of "one taxid == one species == one assembly  
series" just
breaks down in bacteria.

I've brought in the three key people here on the EG side - Eugene  
(does the web
side of this); Dan (main data production manager) and Paul Kersey (the  
EG PI) -
some of them are on holiday now, but I suggest perhaps setting up a  
conference (and/or Adam could you come for a visit?) to get this as  
straight as
we can - I suspect there will both be short term fixes and more longer  
infrastructural fixes here.

> cheers
> Adam
> On 17 Aug 2010, at 18:07, Andy Jenkinson wrote:
>> Hi Adam,
>> There are no coordinate systems yet as nobody has yet been brave  
>> enough to start using DAS with bacteria in anger. Eugene at Ensembl  
>> Genomes will have an interest in doing this, but they have issues  
>> with matching up their species/strain names with the NCBI taxonomy  
>> upon which DAS's coordinates are based. In essence if you will need  
>> to name the coordinate systems after which they will need to be  
>> added to the registry.
>> For example when Ensembl Genomes manage to do this, the coordinate  
>> systems might end up looking like:
>> EB_1,Chromosome,Shigella flexneri 2a str. 301
>> EB_1,Plasmid,Shigella flexneri 2a str. 301
>> This is for a specific shigella strain with taxonomy ID 198214. The  
>> authority and version parts of the DAS coordinate system are  
>> somewhat arbitrarily named, ideally they would be a standard that  
>> is used by the rest of the community for interoperability purposes.
>> What exactly is it you'd like to be able to do? How many species'  
>> are we talking about?
>> The reason I ask is that getting these coordinate systems into the  
>> DAS registry does require some work. Some of this is on the  
>> registry's side, but depending where your data come from there may  
>> be issues with identifying the correct coordinate system details  
>> such that others can reuse them meaningfully. To use the example  
>> above, Ensembl Genomes give the "301" strain a different name from  
>> NCBI and use the taxonomy ID not for the strain but for the parent  
>> species (Shigella flexneri). In fact the 2457T strain also uses the  
>> same taxonomy ID, which isn't helpful. Given the number of  
>> species', this adds up to a major headache.
>> Cheers,
>> Andy
>> On 17 Aug 2010, at 16:49, Adam Witney wrote:
>>> Hi,
>>> What would be the best approach to use DAS with bacterial genomes?  
>>> I can't seem to find any coordinate systems for these organisms in  
>>> the Registry.
>>> Thanks for any advice
>>> Adam
>>> _______________________________________________
>>> DAS mailing list
>>> DAS at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/das
> _______________________________________________
> DAS mailing list
> DAS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das

More information about the DAS mailing list