[DAS] dsn
David Huen
David Huen <smh1008@cus.cam.ac.uk>
Sun, 24 Mar 2002 19:53:01 +0000 (GMT)
On Sun, 24 Mar 2002, David James Sherman wrote:
>
> My question about best practice remains, though: how do people organize
> lots of little sequences that don't yet share a coordinate system. Here
> is the simplest case we have: two ~ 1kb sequences from the opposite ends
> of a ~ 5kb insert (call these STCs).
>
> DP |<----.....~ 3kb.....---->| TP
>
> I would like to have one entry point for the pair, but since the size of
> the unsequenced part in the middle isn't known precisely, I can't invent
> a common coordinate system for the STC pair to which to attach the entry
> point. Each sequence defines a little, independant, coordinate system.
>
I can only tell you how I observe the Drosophila genome people doing it.
They just place the fragments with the correct relative orientations (if
known) spaced by Ns (approximate number if known, it not an fixed number
to represent an unknown number). So in the early phase of sequencing of
P1s/BACs the sequenced fragments derived from the P1s were shrapnel of
this kind often with arbitrary orientation. With time, contigs coalesced.
The most painful phase was when enough coalescence occurred to determine
the actual cytological orientation of the sequences and large numbers of
contigs flipped orientation making it necessary to remap the personal
annotations accumulated previously.
As a user I found it OK.
Even now, there is an entry point 'U' that represents contigs that haven't
made it into the chromosomal arm sequences. It keeps the stuff accessible
without having a zillion entry points so why not? You know where you are
with 'U' sequences, or should if you are working on Dros.
Regards,
David Huen
P.S. I am not a member of the BDGP effort and the above represents my
interpretation of their data releases. For the official interpretation,
you'll need to speak to them.