[Bioperl-l] blast results accession numbers
Wiepert, Mathieu
Wiepert.Mathieu@mayo.edu
Mon, 4 Nov 2002 12:58:18 -0600
Hi,
I noticed that when parsing text blast results, the accession number is not always parsed correctly. Instead the locus number is given. I am going to fix that to give me the accession number, according to the docs from
ftp://ftp.ncbi.nih.gov/blast/db/README.
For some of them, I am not sure what to do (see bottom for database fasta description template for blast results)
PDB - take entry
GNL - take identifier.
The current output also does not keep the version (the version is not kept in the XML output either). I will not make the text parsing keep it either, unless someone chimes in that they want it. Otherwise I am defaulting to what I can find in the XML output.
If anyone has strong feelings, let me know, otherwise I am putting this in?
FYI - copied from above link
Appendix 1: Sequence Identifier Syntax
The syntax of sequence header lines used by the NCBI BLAST server depends on
the database from which each sequence was obtained. The table below lists
the identifiers for the databases from which the sequences were derived.
Database Name Identifier Syntax
============================ ========================
GenBank gb|accession|locus
EMBL Data Library emb|accession|locus
DDBJ, DNA Database of Japan dbj|accession|locus
NBRF PIR pir||entry
Protein Research Foundation prf||name
SWISS-PROT sp|accession|entry name
Brookhaven Protein Data Bank pdb|entry|chain
Patents pat|country|number
GenInfo Backbone Id bbs|number
General database identifier gnl|database|identifier
NCBI Reference Sequence ref|accession|locus
Local Sequence identifier lcl|identifier
-Mat