Seqret and GIs

Peter Rice peter.rice at uk.lionbioscience.com
Wed Mar 27 10:27:18 UTC 2002


Bonjour Francis!!! (or is that bon nuit, at 2am in Vancouver?)

> the simple reason for that is historical, and software updating. As
> you know NCBI was using gi's many years before DDBJ and EMBL saw the
> wisdom of tracking sequence vbersion numbers -- now that they are all
> doing it, it does make much more sense to use SV, but historically,
> nobody wanted to know about gi's -- but that was the backbone (and
> still is) of all sequence tracking within the NCBI. As far as tools
> and databases are concerned -- a gi and a SV number are exactly the
> same -- it's just the human brain which finds the SV more esthetically
> pleasing :-)

Ah yes, I remember the backbone database too. GIs were long before SVs but,
I seem to recall, were not so easy to synchronize between GenBank, EMBL and
DDBJ and so SVs were invented. I like SVs because it makes it possible to
find the current version of any sequence and to guess the SV of all
previous versions. Something similar happened with pid/protein_id.

EMBOSS will be able to read the GI from a GenBank entry, but it is lost in
the EMBL equivalent. At the moment, the development EMBOSS code is writing
GenBank VERSION lines without the GI, so it makes sense to include it in
the data structure. Do you happen to know if there is a need to include a
GI in a GenBank VERSION line, and if so what number should EMBOSS 'invent'
(0 would be the obvious choice, although 999999999 would also be possible)?

SVs are tricky too. The SRS parser indexes "AA123456.1" as "1" which is not
very useful for database searching. EBI have changed their SRS server to
index the SV as a complete string, and that is a requirement to use SV in
EMBOSS 2.4 (of course, you can point to the EBI's SRS server with the new
srswww access method ... oops, another preannouncement :-)

Peter

-- 
------------------------------------------------
Peter Rice, LION Bioscience Ltd, Cambridge, UK
peter.rice at uk.lionbioscience.com +44 1223 224723




More information about the EMBOSS mailing list