Seqret and GIs

Francis Ouellette francis at cmmt.ubc.ca
Wed Mar 27 09:39:12 UTC 2002



> (Aside: Why don't NCBI use the sequence version so entries can be
> tracked by accession number as well??? AA123456.1 is so much more
> useful than 1681491).

Peter,

the simple reason for that is historical, and software updating. As
you know NCBI was using gi's many years before DDBJ and EMBL saw the
wisdom of tracking sequence vbersion numbers -- now that they are all
doing it, it does make much more sense to use SV, but historically,
nobody wanted to know about gi's -- but that was the backbone (and
still is) of all sequence tracking within the NCBI. As far as tools
and databases are concerned -- a gi and a SV number are exactly the
same -- it's just the human brain which finds the SV more esthetically
pleasing :-)

cheers,

f. (ex-NCBIer)


--
| B.F. Francis Ouellette                       francis at cmmt.ubc.ca | 
| Director, Bioinformatics Core Facility       Tel: (604) 875-3815 | 
| Centre for Molecular Medicine & Therapeutics Fax: (425) 740-6978 | 
| Vancouver, BC Canada            http://www.cmmt.ubc.ca/ouellette |



On Wed, 27 Mar 2002, Peter Rice wrote:

> Date: Wed, 27 Mar 2002 09:19:41 +0000
> From: Peter Rice <peter.rice at uk.lionbioscience.com>
> To: Richard Cote <richard at seqbio.com>
> Cc: emboss at embnet.org
> Subject: Re: Seqret and GIs
> 
> 
> Richard Cote wrote:
> > Is there a way to retrieve entries from genbank flatfiles and blast
> > formatted databases based on the GI using seqret?
> > 
> > If I use seqret blastnr:NP_005047.1 (a typical AC entry), it will return
> > a fasta file without a problem. If I use seqret blastnr:4826968 (the GI
> > corresponding to the same AC as above), it complains that it cannot find
> > the entry...
> > 
> > The reason why I need to access records through their GI and not their
> > AC is that the standalone www blast server only returns a GI in the html
> > output and not a AC.
> > 
> > Can anyone help?
> 
> Well ... You can write a script to query the database by GI (or ID or ACC)
> using some other NCBI utility, and use that as "methodentry". That will
> work with the present EMBOSS release. But also ...
> 
> Coming soon (in EMBOSS 2.4, but some work is needed before we have the
> index fields for dbiblast indexed databases) is the ability to query by
> additional fields, including for example "SV" for the sequence version
> (AA123456.1 for example).
> 
> It is easy to extend this to include GI (the USA would be expanded to
> "BLASTNR-GI:4826968"), but this would be limited to databases that include
> a GI number, for example GenBank but not EMBL. (Aside: Why don't NCBI use
> the sequence version so entries can be tracked by accession number as
> well??? AA123456.1 is so much more useful than 1681491).
> 
> For EMBOSS is is not a problem if only a few databases include a field,
> because the database definition includes the list of fields that can be
> queried (you add "fields: sv" to the database definition to query by
> SeqVersion) so GI can be limited to NCBI format blast databases.
> 
> The fields added so far are (in addition to ID and ACC already supported) :
> 
> SV (EMBL/GenBank sequence version)
> DES (words in description)
> KEY (complete keywords)
> ORG (taxonomy levels)
> 
> These work (because they are part of the query language) through SRS, and
> for querying simple file input. We are looking at how best to build indices
> for them with dbiflat, dbifasta, dbiblast and dbigcg. The index file format
> and the source code to query the indices will be essentially the same as
> the existing code for accession  numbers.
> 
> Are there other fields that would be useful?
> 
> regards,
> 
> Peter Rice
> 
> 


--
| B.F. Francis Ouellette                       francis at cmmt.ubc.ca | 
| Director, Bioinformatics Core Facility       Tel: (604) 875-3815 | 
| Centre for Molecular Medicine & Therapeutics Fax: (425) 740-6978 | 
| Vancouver, BC Canada            http://www.cmmt.ubc.ca/ouellette |






More information about the EMBOSS mailing list