[Bioperl-l] Re: GO dbxrefs in swissprot
Hilmar Lapp
hlapp at gnf.org
Fri Jul 2 12:48:16 EDT 2004
Pretty weird what you describe if it works for one entry but not
another. Also, the DR lines don't look suspiciously different.
If there's no direct reason that prevents you from doing so you should
definitely upgrade to the 1.4.x series, possibly even to the latest
version of the stable branch from CVS. There were quite some fixes
meanwhile, some of which do affect how sequences get loaded into biosql
because the affect the annotation bundle.
Let me know if the problem persists after the upgrade, and if it does
send me the two files.
I'm also cc'ing this to the bioperl list because it is really a bioperl
problem, not a biosql-related one.
-hilmar
On Friday, July 2, 2004, at 03:16 AM, Andreas Henschel wrote:
> Hi Hilmar,
>
> Thanks for your reply. I was wondering if it is due to my patched
> bioperl 1.2.1?
> Hilmar Lapp wrote:
>
>> When you say the GO dbxrefs did not appear, how do you mean? Are you
>> referring to dbxrefs present in the source file but absent as
>> association rows in bioentry_dbxref?
>>
> Yes!
>
>> If you have a swissprot entry that has GO dbxrefs in the source file
>> but fails to have those associated in bioentry_dbxref, check whether
>> the Bio::Seq object that's coming from the parser has them as
>> annotation. It would sound strange if some entries get the
>> associations whereas others don't.
>>
> Ok, here is what I did: I modified load_seqdatabase.pl to print out
> the annotions. I ran it, comparing two small flatfiles, both
> containing GO annotations (according to flatfile and swissprot
> website).
> For the first, the parser detected no GO annotation, where as the
> latter got it:
>
> $prompt> perl load_seqdatabase.pl --host dbserver --dbuser ah --dbname
> bioseqdb --namespace swissprot --format swiss --lookup --remove
> --testonly P53396.dat
>
> Annotation dblink stringified value Direct database link to X64330 in
> database EMBL
> Annotation dblink stringified value Direct database link to U18197 in
> database EMBL
> Annotation dblink stringified value Direct database link to BC006195
> in database EMBL
> Annotation dblink stringified value Direct database link to S21173 in
> database PIR
> Annotation dblink stringified value Direct database link to P07459 in
> database HSSP
> Annotation dblink stringified value Direct database link to HGNC:115
> in database Genew
> Annotation dblink stringified value Direct database link to P53396 in
> database GK
> Annotation dblink stringified value Direct database link to 108728 in
> database MIM
> Annotation dblink stringified value Direct database link to IPR002020
> in database InterPro
> Annotation dblink stringified value Direct database link to IPR003781
> in database InterPro
> Annotation dblink stringified value Direct database link to IPR005811
> in database InterPro
> Annotation dblink stringified value Direct database link to IPR005810
> in database InterPro
> Annotation dblink stringified value Direct database link to IPR005809
> in database InterPro
> Annotation dblink stringified value Direct database link to PF02629 in
> database Pfam
> Annotation dblink stringified value Direct database link to PF00549 in
> database Pfam
> Annotation dblink stringified value Direct database link to PS01216 in
> database PROSITE
> Annotation dblink stringified value Direct database link to PS00399 in
> database PROSITE
> Annotation dblink stringified value Direct database link to PS01217 in
> database PROSITE
>
> $prompt> perl load_seqdatabase.pl --host dbserver --dbuser ah --dbname
> bioseqdb --namespace swissprot --format swiss --lookup --remove
> --testonly Q15777.dat
> Loading Q15777.dat ...
>
> Annotation dblink stringified value Direct database link to U57911 in
> database EMBL
> Annotation dblink stringified value Direct database link to BC031582
> in database EMBL
> Annotation dblink stringified value Direct database link to HGNC:1180
> in database Genew
> Annotation dblink stringified value Direct database link to 600911 in
> database MIM
> Annotation dblink stringified value Direct database link to GO:0007399
> in database GO
> Annotation dblink stringified value Direct database link to IPR004843
> in database InterPro
> Annotation dblink stringified value Direct database link to PF00149 in
> database Pfam
>
>
> The corresponding DR entries in the two flat files are:
> P53396.dat:
> DR EMBL; X64330; CAA45614.1; -.
> DR EMBL; U18197; AAB60340.1; -.
> DR EMBL; BC006195; AAH06195.1; -.
> DR PIR; S21173; S21173.
> DR HSSP; P07459; 1JKJ.
> DR Genew; HGNC:115; ACLY.
> DR GK; P53396; -.
> DR MIM; 108728; -.
> DR GO; GO:0009346; C:citrate lyase complex; TAS.
> DR GO; GO:0003878; F:ATP citrate synthase activity; TAS.
> DR GO; GO:0006200; P:ATP catabolism; TAS.
> DR GO; GO:0006101; P:citrate metabolism; TAS.
> DR GO; GO:0015936; P:coenzyme A metabolism; TAS.
> DR InterPro; IPR002020; Citrate_synth.
> DR InterPro; IPR003781; CoA_binding.
> DR InterPro; IPR005811; CoA_ligase.
> DR InterPro; IPR005810; CoA_lig_alpha.
> DR InterPro; IPR005809; CoA_lig_beta.
> DR Pfam; PF02629; CoA_binding; 1.
> DR Pfam; PF00549; Ligase_CoA; 1.
> DR PROSITE; PS01216; SUCCINYL_COA_LIG_1; 1.
> DR PROSITE; PS00399; SUCCINYL_COA_LIG_2; 1.
> DR PROSITE; PS01217; SUCCINYL_COA_LIG_3; 1.
>
> Q15777.dat:
> DR EMBL; U57911; AAC50564.1; -.
> DR EMBL; BC031582; AAH31582.1; -.
> DR Genew; HGNC:1180; C11orf8.
> DR MIM; 600911; -.
> DR GO; GO:0007399; P:neurogenesis; TAS.
> DR InterPro; IPR004843; M-ppestrase.
> DR Pfam; PF00149; Metallophos; 1.
>
> Cheers
> Andreas
>
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the Bioperl-l
mailing list