[DAS] DAS and bacterial genomes

Javier Herrero jherrero at ebi.ac.uk
Thu Aug 19 16:00:55 UTC 2010

Yes, this can be fun. I can also foresee similar problems in vertebrate 
genomes. On one hand, some assemblies correspond to more than one single 
individual: chimeric assemblies, haplotypes, etc. On the other hand, you can 
get more than one genome per individual (i.e. cancer genomes).


On Thursday 19 Aug 2010 10:56:49 Ewan Birney wrote:
> Just to repeat :
>   I always think this should be easy and then I get educated by Paul:
>   I thikn each time one thinks about "just moving it down a
> level" (eg, to strain) there are submitted
> cases in which two people have submitted assemblies with the same
> "strain tax id" but actually
> clearly arent (eg, there is a big insertion of something). The whole
> thing keeps moving down
> a notch.
>   The right thing here is to assign tracking idenitifers to assembly
> series independently of
> the strain assignments, and track assemblies separately (but obviously
> with relationships)
> to strains.
>    Paul has met most (?all) of the use cases and understands this
> better than me. I think
> we should wait for Paul to weigh in here - it's just always a bit more
> complicated than you
> think ;)
> On 19 Aug 2010, at 00:10, Andy Jenkinson wrote:
> > On 18 Aug 2010, at 20:47, Adam Witney wrote:
> >>> As mentioned elsewhere in this thread, the problem of
> >>> distinguishing an individual from its taxon is not limited to
> >>> bacteria. Does  the 1000 genomes project use the assembly as a
> >>> surrogate for isolate/individual?
> >> 
> >> No it's not unique to bacteria, indeed I notice that Plasmodium
> >> vivax is already in the DAS registry, does this not suffer the same
> >> problems? i.e which strain are the coordinates referring to?
> > 
> > So far a coordinate system always refers to the species or strain
> > identified by its taxonomy ID. As you say, strains DO have their own
> > NCBI taxonomy ID. It may be that this is not the case for a strain
> > that someone wants to annotate, but I have yet to see an actual
> > example. There is the wider question of how to handle individuals
> > though. I can't comment on how 1000 genomes do this as I've only
> > seen these data expressed as variations annotated upon the reference
> > assembly, but my feeling is that if annotations of an individual
> > were needed then it could/would be done using the assembly paradigm
> > as a surrogate.
> > _______________________________________________
> > DAS mailing list
> > DAS at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/das
> _______________________________________________
> DAS mailing list
> DAS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/das

Javier Herrero, PhD
Ensembl Compara Project Leader
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge - CB10 1SD - UK

More information about the DAS mailing list