[DAS] standard entry_points
David Huen
David Huen <smh1008@cus.cam.ac.uk>
Wed, 20 Mar 2002 03:29:43 +0000 (GMT)
On Tue, 19 Mar 2002, Paul Gordon wrote:
> This may be a naive approach, but can you not use a standard conventions
> that would work across organisms? E.g. the DAS specification could say
> something along the lines of
>
> "The top level entry points, when refering to the distinct DNA units in a
> cell, an implementation shall use the conventions:
>
> Chromosome x
> Where x is a decimal number (e.g. 1 or 23), X or Y
>
> Plasmid x
> Where x is the name of the plasmid in GenBank, if available
>
> Extrachromosomal element x
> ...
>
There may be a bit of a problem here. I have the impression some genome
projects have sequence coordinates running through the centromeres (did
they actually manage to sequence those or are they arbitrarily assigned?)
while others have sequences coordinates on a per-arm basis. While
many chromosomes do have two meaningful arms, even those are not named
uniformly between species. While humans, you have the p and q, with
Drosophila, you have L and R.
There is also a disparity between disciplines that have named their
chromosomes on with Arabic numerals (1,2,...) and those that have done so
with Roman numerals (I, II, III, IV, ...). Plus those poor doomed souls
who have more than one chromosome numbering system competing for dominance
(dog cytology, the same number can be a different chromosome depending on
author, I'm told). Certainly, I don't believe DAS should change whatever
convention exists within the community of that species.
I can only comment on the only genome I may understand, that of
Drosophila. There's effectively one genome project that has an arm-based
coordinate system. I would rather not see a force-fit to a
chromosome-wide coordinate system that the above proposal would entail as
I would not like to see a skew between the current genome project official
release coordinates and any coordinates that a DAS server I maintain might
have to provide on the above basis. It'll only confuse the users.
If we do accept that entry points can be arms, then arm names must be
acceptable and we are not going to see a uniform system like the above
applicable to all species given that arm labelling is already
species-specific.
Personally, I would be more than happy to see a standardisation on a
per-species basis: after all, the resolution we most frequently need in
DAS is within species. In which case I'd be happy to see a
uniform set of labels for chromosomes and arms in a species which should
be readily achievable (except perhaps for the dog people) given that the
number of implementors of DAS reference servers is small and they are
almost all on this list. And once the reference servers are agreed,
annotation servers would have to fall in line anyway. And if such
standardisation is going to occur, it should happen sooner rather than
later before more servers come online.
For my 0.02 Euros-worth, Drosophila melanogaster top level entry
points could be 2L, 2R, 3L, 3R, X and Y until such time as they use move
to chromosome-wide coordinates. These seem to be the labels used by the
existing genome project anyway.
Regards,
David Huen