[BioSQL-l] BioSQL : GenBank db_xref names in dbxref table

Richard Holland holland at ebi.ac.uk
Mon Nov 26 09:05:37 UTC 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi there. BioJava uses the labels as-is from the file without trying to
translate them further. The exceptions are taxon xrefs which get
translated into taxon objects, but everything else is unchanged.

cheers,
Richard

Peter wrote:
> Dear all,
> 
> I'm one of the Biopython developers.  I've recently got going with
> BioSQL and have been getting to grips with the Biopython BioSQL
> interface.  I'm aware that we need to try and be consistent with
> BioPerl and BioJava, so I'd like to pose my first question related to
> that.
> 
> When loading GenBank records, many features have db_xref qualifiers,
> e.g. from a random CDS feature in E. coli K12:
> 
>                      /db_xref="ASAP:1309"
>                      /db_xref="GI:16128366"
>                      /db_xref="ECOCYC:EG10213"
>                      /db_xref="GeneID:945313"
> 
> Bioython attempts to translate the strings "ASAP", "GI", "ECOCYC",
> "GeneID" before using recording these entries in the seqfeature_dbxref
> and dbxref tables.  For example, "GI" becomes "GeneIndex".
> Biopython's current mapping is as follows:
> 
> # Dictionary of database types, keyed by GenBank db_xref abbreviation
> db_dict = {'GeneID': 'Entrez',
>            'GI': 'GeneIndex',
>            'COG': 'COG',
>            'CDD': 'CDD',
>            'DDBJ': 'DNA Databank of Japan',
>            'Entrez': 'Entrez',
>            'GeneIndex': 'GeneIndex',
>            'PUBMED': 'PubMed',
>            'taxon': 'Taxon',
>            'ATCC': 'ATCC',
>            'ISFinder': 'ISFinder',
>            'GOA': 'Gene Ontology Annotation',
>            'ASAP': 'ASAP',
>            'PSEUDO': 'PSEUDO',
>            'InterPro': 'InterPro',
>            'GEO': 'Gene Expression Omnibus',
>            'EMBL': 'EMBL',
>            'UniProtKB/Swiss-Prot': 'UniProtKB/Swiss-Prot',
>            'ECOCYC': 'EcoCyc',
>            'UniProtKB/TrEMBL': 'UniProtKB/TrEMBL'
>            }
> 
> In my testing, I've found several GenBank db_xref abbreviation for
> which we don't have a mapping defined, such as "LocusID", "dbSNP",
> "MGD", "MIM", or from an EMBL file, "REMTREMBL".
> 
> I'd like to know if BioPerl and/or BioJava and/or BioRuby define a
> similar mapping in their BioSQL code (or GenBank parser), so that
> Biopython can follow your example.
> 
> Thank you,
> 
> Peter
> 
> P.S. See also Biopython bug 2405
> http://bugzilla.open-bio.org/show_bug.cgi?id=2405
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
> 

- --
Richard Holland (BioMart)
EMBL EBI, Wellcome Trust Genome Campus,
Hinxton, Cambridgeshire CB10 1SD, UK
Tel. +44 (0)1223 494416

http://www.biomart.org/
http://www.biojava.org/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHSoxh4C5LeMEKA/QRApBrAKCQDwWTHF9OQHA61PeUR/gUKdBj3wCffzDJ
7qoEUN+9XnMNkVe7wOeERbU=
=80+z
-----END PGP SIGNATURE-----



More information about the BioSQL-l mailing list