[DAS] dsn

David James Sherman David.Sherman@LaBRI.FR
Fri, 22 Mar 2002 09:50:26 +0100


Is there a best practice for defining entry points for large-scale random
sequencing projects? I'm dealing with data that by design are not intended
to produce long contigs, but shallow coverage (< 40% of the genes) of
a large number of closely-related species. We have 50_000 paired sequences
from 13 different species. Not all species have nontrivial contigs.

As 25_000 entry points seemed ridiculous we decided to use 13, specifically
one big artificial chromosome per species, with wide spacing between the
STC pairs. This is at best misleading (!), and will certainly get us into 
trouble when real chromosomes are identified.

Presumably even full-genome sequencing projects start somewhere, with
lots of little sequences. Does everyone just wait until they have a
smallish number of established contigs before deploying DAS?

Lincoln Stein writes:

  > The entry points were intended to be a well-known location where the end user
  > could enter the genome.  There are supposed to be no more entry points than 
  > can be displayed in a popup menu.  For example, the list of chromosomes.

and:

  > People working on unfinished genomes are going to want to use golden path
  > contigs, and so on, not chromosomes.  I think it's best if we let the
  > species-specific sites figure out what works best for them.

This is what we understood by reading the spec. I completely agree and
think we'd look askance at a naming scheme that assumed that everyone
is doing full-genome sequencing.


djs               David J. Sherman          (David.Sherman@LaBRI.FR)
                  Laboratoire Bordelais de Recherche en Informatique
                  voix : +33 5 56 84 6922     fax : +33 5 56 84 6669
                  icbm : 44°48'28.2"/-00°35'47.4"/49m