[DAS] DAS and bacterial genomes

Adam Witney awitney at sgul.ac.uk
Wed Aug 18 19:47:30 UTC 2010


On 18 Aug 2010, at 20:25, Lincoln Stein wrote:

> It sounds to me as though the DAS source metadata needs just one additional field to indicate the strain, isolate or individual, making the hierarchy:
> 
> taxid -> strain/isolate/individual id -> assembly
> 
> (the taxid and the assembly are already in DAS) Is there a definitive repository of isolate IDs that could be used for this purpose?

most of the sequenced strains already have strain specific taxid's. Although I don't know how NCBI add new taxid's so not sure how this will scale with the number of genomes that are currently being sequenced.

> As mentioned elsewhere in this thread, the problem of distinguishing an individual from its taxon is not limited to bacteria. Does  the 1000 genomes project use the assembly as a surrogate for isolate/individual? 

No it's not unique to bacteria, indeed I notice that Plasmodium vivax is already in the DAS registry, does this not suffer the same problems? i.e which strain are the coordinates referring to?

Adam


> Lincoln
> 
> On Wed, Aug 18, 2010 at 5:30 AM, Ewan Birney <birney at ebi.ac.uk> wrote:
> 
> On 18 Aug 2010, at 09:52, Adam Witney wrote:
> 
> 
> Hi Andy,
> 
> Yes I am aware of the some of the idiosyncrasies of the Ensembl Genomes naming conventions. But is there a reason that the DAS registry should be constrained by Ensembl Genomes? Could the Registry entry refer to a specific taxonomy iD and its corresponding entry in EG, despite EG using a different taxonomy ID?
> 
> I'd like to be able to export our microarray designs and data via DAS for others to use (including EnsemblBacteria). This if for 16 or so species with multiple strains thereof.
> 
> 
> Just to say that I think we should get this as straight as we can; just to
> state the obvious - EG is not trying to be deliberately complex here, it is
> just that the concept of "one taxid == one species == one assembly series" just
> breaks down in bacteria.
> 
> 
> I've brought in the three key people here on the EG side - Eugene (does the web
> side of this); Dan (main data production manager) and Paul Kersey (the EG PI) -
> some of them are on holiday now, but I suggest perhaps setting up a phone
> conference (and/or Adam could you come for a visit?) to get this as straight as
> we can - I suspect there will both be short term fixes and more longer term
> infrastructural fixes here.
> 
> 
> 
> cheers
> 
> Adam
> 
> 
> On 17 Aug 2010, at 18:07, Andy Jenkinson wrote:
> 
> Hi Adam,
> 
> There are no coordinate systems yet as nobody has yet been brave enough to start using DAS with bacteria in anger. Eugene at Ensembl Genomes will have an interest in doing this, but they have issues with matching up their species/strain names with the NCBI taxonomy upon which DAS's coordinates are based. In essence if you will need to name the coordinate systems after which they will need to be added to the registry.
> 
> For example when Ensembl Genomes manage to do this, the coordinate systems might end up looking like:
> EB_1,Chromosome,Shigella flexneri 2a str. 301
> EB_1,Plasmid,Shigella flexneri 2a str. 301
> 
> This is for a specific shigella strain with taxonomy ID 198214. The authority and version parts of the DAS coordinate system are somewhat arbitrarily named, ideally they would be a standard that is used by the rest of the community for interoperability purposes.
> 
> What exactly is it you'd like to be able to do? How many species' are we talking about?
> 
> The reason I ask is that getting these coordinate systems into the DAS registry does require some work. Some of this is on the registry's side, but depending where your data come from there may be issues with identifying the correct coordinate system details such that others can reuse them meaningfully. To use the example above, Ensembl Genomes give the "301" strain a different name from NCBI and use the taxonomy ID not for the strain but for the parent species (Shigella flexneri). In fact the 2457T strain also uses the same taxonomy ID, which isn't helpful. Given the number of species', this adds up to a major headache.
> 
> Cheers,
> Andy
> 
> On 17 Aug 2010, at 16:49, Adam Witney wrote:
> 
> Hi,
> 
> What would be the best approach to use DAS with bacterial genomes? I can't seem to find any coordinate systems for these organisms in the Registry.
> 
> Thanks for any advice
> 
> Adam
> _______________________________________________
> DAS mailing list
> DAS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das
> 
> 
> 
> _______________________________________________
> DAS mailing list
> DAS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das
> 
> _______________________________________________
> DAS mailing list
> DAS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das
> 
> 
> 
> -- 
> Lincoln D. Stein
> Director, Informatics and Biocomputing Platform
> Ontario Institute for Cancer Research
> 101 College St., Suite 800
> Toronto, ON, Canada M5G0A3
> 416 673-8514
> Assistant: Renata Musa <Renata.Musa at oicr.on.ca>





More information about the DAS mailing list