[BioSQL-l] GO dbxrefs in swissprot

Andreas Henschel henschel at mpi-cbg.de
Fri Jul 2 06:16:06 EDT 2004


Hi Hilmar,

Thanks for your reply. I was wondering if it is due to my patched 
bioperl 1.2.1?
Hilmar Lapp wrote:

> When you say the GO dbxrefs did not appear, how do you mean? Are you 
> referring to dbxrefs present in the source file but absent as 
> association rows in bioentry_dbxref?
>
Yes!

> If you have a swissprot entry that has GO dbxrefs in the source file 
> but fails to have those associated in bioentry_dbxref, check whether 
> the Bio::Seq object that's coming from the parser has them as 
> annotation. It would sound strange if some entries get the 
> associations whereas others don't.
>
Ok, here is what I did: I modified load_seqdatabase.pl to print out the 
annotions. I ran it, comparing two small flatfiles, both containing GO 
annotations (according to flatfile and swissprot website).
For the first, the parser detected no GO annotation, where as the latter 
got it:

$prompt> perl load_seqdatabase.pl --host dbserver --dbuser ah --dbname 
bioseqdb --namespace swissprot   --format swiss --lookup --remove 
--testonly P53396.dat

Annotation dblink stringified value Direct database link to X64330 in 
database EMBL
Annotation dblink stringified value Direct database link to U18197 in 
database EMBL
Annotation dblink stringified value Direct database link to BC006195 in 
database EMBL
Annotation dblink stringified value Direct database link to S21173 in 
database PIR
Annotation dblink stringified value Direct database link to P07459 in 
database HSSP
Annotation dblink stringified value Direct database link to HGNC:115 in 
database Genew
Annotation dblink stringified value Direct database link to P53396 in 
database GK
Annotation dblink stringified value Direct database link to 108728 in 
database MIM
Annotation dblink stringified value Direct database link to IPR002020 in 
database InterPro
Annotation dblink stringified value Direct database link to IPR003781 in 
database InterPro
Annotation dblink stringified value Direct database link to IPR005811 in 
database InterPro
Annotation dblink stringified value Direct database link to IPR005810 in 
database InterPro
Annotation dblink stringified value Direct database link to IPR005809 in 
database InterPro
Annotation dblink stringified value Direct database link to PF02629 in 
database Pfam
Annotation dblink stringified value Direct database link to PF00549 in 
database Pfam
Annotation dblink stringified value Direct database link to PS01216 in 
database PROSITE
Annotation dblink stringified value Direct database link to PS00399 in 
database PROSITE
Annotation dblink stringified value Direct database link to PS01217 in 
database PROSITE

$prompt> perl load_seqdatabase.pl --host dbserver --dbuser ah --dbname 
bioseqdb --namespace swissprot --format swiss --lookup --remove 
--testonly Q15777.dat
Loading Q15777.dat ...

Annotation dblink stringified value Direct database link to U57911 in 
database EMBL
Annotation dblink stringified value Direct database link to BC031582 in 
database EMBL
Annotation dblink stringified value Direct database link to HGNC:1180 in 
database Genew
Annotation dblink stringified value Direct database link to 600911 in 
database MIM
Annotation dblink stringified value Direct database link to GO:0007399 
in database GO
Annotation dblink stringified value Direct database link to IPR004843 in 
database InterPro
Annotation dblink stringified value Direct database link to PF00149 in 
database Pfam


The corresponding DR entries in the two flat files are:
P53396.dat:
DR   EMBL; X64330; CAA45614.1; -.
DR   EMBL; U18197; AAB60340.1; -.
DR   EMBL; BC006195; AAH06195.1; -.
DR   PIR; S21173; S21173.
DR   HSSP; P07459; 1JKJ.
DR   Genew; HGNC:115; ACLY.
DR   GK; P53396; -.
DR   MIM; 108728; -.
DR   GO; GO:0009346; C:citrate lyase complex; TAS.
DR   GO; GO:0003878; F:ATP citrate synthase activity; TAS.
DR   GO; GO:0006200; P:ATP catabolism; TAS.
DR   GO; GO:0006101; P:citrate metabolism; TAS.
DR   GO; GO:0015936; P:coenzyme A metabolism; TAS.
DR   InterPro; IPR002020; Citrate_synth.
DR   InterPro; IPR003781; CoA_binding.
DR   InterPro; IPR005811; CoA_ligase.
DR   InterPro; IPR005810; CoA_lig_alpha.
DR   InterPro; IPR005809; CoA_lig_beta.
DR   Pfam; PF02629; CoA_binding; 1.
DR   Pfam; PF00549; Ligase_CoA; 1.
DR   PROSITE; PS01216; SUCCINYL_COA_LIG_1; 1.
DR   PROSITE; PS00399; SUCCINYL_COA_LIG_2; 1.
DR   PROSITE; PS01217; SUCCINYL_COA_LIG_3; 1.

Q15777.dat:
DR   EMBL; U57911; AAC50564.1; -.
DR   EMBL; BC031582; AAH31582.1; -.
DR   Genew; HGNC:1180; C11orf8.
DR   MIM; 600911; -.
DR   GO; GO:0007399; P:neurogenesis; TAS.
DR   InterPro; IPR004843; M-ppestrase.
DR   Pfam; PF00149; Metallophos; 1.

Cheers
Andreas



More information about the BioSQL-l mailing list