[BioPython] [BioSQL-l] BioSQL : GenBank db_xref names in dbxref table
Mauricio Herrera Cuadra
arareko at campus.iztacala.unam.mx
Thu Nov 22 16:37:24 UTC 2007
Hi Peter,
In BioPerl, there's no such mapping for db_xref's that I'm aware of.
Each parser handles db_xref records on its own. Take a look at the
Bio::SeqIO::genbank code, inside the next_seq() method for example:
http://code.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/SeqIO/genbank.pm?rev=HEAD&content-type=text/vnd.viewcvs-markup
Regards,
Mauricio.
Peter wrote:
> Dear all,
>
> I'm one of the Biopython developers. I've recently got going with
> BioSQL and have been getting to grips with the Biopython BioSQL
> interface. I'm aware that we need to try and be consistent with
> BioPerl and BioJava, so I'd like to pose my first question related to
> that.
>
> When loading GenBank records, many features have db_xref qualifiers,
> e.g. from a random CDS feature in E. coli K12:
>
> /db_xref="ASAP:1309"
> /db_xref="GI:16128366"
> /db_xref="ECOCYC:EG10213"
> /db_xref="GeneID:945313"
>
> Bioython attempts to translate the strings "ASAP", "GI", "ECOCYC",
> "GeneID" before using recording these entries in the seqfeature_dbxref
> and dbxref tables. For example, "GI" becomes "GeneIndex".
> Biopython's current mapping is as follows:
>
> # Dictionary of database types, keyed by GenBank db_xref abbreviation
> db_dict = {'GeneID': 'Entrez',
> 'GI': 'GeneIndex',
> 'COG': 'COG',
> 'CDD': 'CDD',
> 'DDBJ': 'DNA Databank of Japan',
> 'Entrez': 'Entrez',
> 'GeneIndex': 'GeneIndex',
> 'PUBMED': 'PubMed',
> 'taxon': 'Taxon',
> 'ATCC': 'ATCC',
> 'ISFinder': 'ISFinder',
> 'GOA': 'Gene Ontology Annotation',
> 'ASAP': 'ASAP',
> 'PSEUDO': 'PSEUDO',
> 'InterPro': 'InterPro',
> 'GEO': 'Gene Expression Omnibus',
> 'EMBL': 'EMBL',
> 'UniProtKB/Swiss-Prot': 'UniProtKB/Swiss-Prot',
> 'ECOCYC': 'EcoCyc',
> 'UniProtKB/TrEMBL': 'UniProtKB/TrEMBL'
> }
>
> In my testing, I've found several GenBank db_xref abbreviation for
> which we don't have a mapping defined, such as "LocusID", "dbSNP",
> "MGD", "MIM", or from an EMBL file, "REMTREMBL".
>
> I'd like to know if BioPerl and/or BioJava and/or BioRuby define a
> similar mapping in their BioSQL code (or GenBank parser), so that
> Biopython can follow your example.
>
> Thank you,
>
> Peter
>
> P.S. See also Biopython bug 2405
> http://bugzilla.open-bio.org/show_bug.cgi?id=2405
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l
>
--
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Genética
Unidad de Morfofisiología y Función
Facultad de Estudios Superiores Iztacala, UNAM
More information about the Biopython
mailing list