[Bioperl-l] GO terms not present in Swiss annotation object-more details
Juan Cristobal Vera
jcv128 at psu.edu
Tue Nov 21 20:53:07 UTC 2006
Hi,
I'm writing a simple application to extract various fields from
swissprot objects and I can't access the GO terms found in
dblink part of the swiss format flat files. I'm
not a professional programmer and I can't figure out why this is
occuring. All the other dblink keys are being
generated as far as I can tell (e.g. embl, pfam, etc). The GO
terms are just skipped over and it's driving me crazy. Not sure if
this is a bug or a deliberate strategy I'm unfamiliar with.
I'm using ActivePerl 5.8.8 build 819 on a windows machine (sorry) and
the bioperl 1.4 PPM3 package. Perhaps this is too old?
Here's
part of my code (mostly derived from bioperl docs):
.........................
#cut
$seqInObj = $indexObj->get_Seq_by_id($line); #get sequence and create seq object
#cut
if (defined $seqInObj->annotation){
$annotObj = $seqInObj->annotation; #create annotation object
foreach $key ($annotObj->get_all_annotation_keys){
@values =$annotObj->get_Annotations($key);
foreach $value (@values){
if (lc($key) eq "dblink"){
print $outfh "Annotation: $key\n";
print $outfh $value->as_text,"\n";
$dbhash_ref = $value->hash_tree;
for $dbKey (keys %{$dbhash_ref}) {
print $outfh $dbKey,":",$dbhash_ref->{$dbKey},"\n"; #none of these prints produce GO terms
}
}
}
}
}
.........................
My program searches an indexed database on
my machine, creates the objects, and prints out relevant annotations.
Here are some of the accessions I used for testing:
P19351 TNNT_DROME
P36188 TNNI_DROME
P11147 HSP7D_DROME
..........................................
the relevant output looks something like this (for debugging) for
P19351:
......................................................................
Direct database link to X58188 in database EMBL
database: EMBL
comment: -; Genomic_DNA.
primary_id: X58188
optional_id: CAA41171.1
Annotation: dblink
Direct database link to X59376 in database EMBL
database: EMBL
comment: -; mRNA.
primary_id: X59376
optional_id: CAA42020.1
Annotation: dblink
Direct database link to AE003507 in database EMBL
database: EMBL
comment: -; Genomic_DNA.
primary_id: AE003507
optional_id: AAF48802.2
Annotation: dblink
Direct database link to AE003507 in database EMBL
database: EMBL
comment: -; Genomic_DNA.
primary_id: AE003507
optional_id: AAF48803.2
Annotation: dblink
Direct database link to AE003507 in database EMBL
database: EMBL
comment: -; Genomic_DNA.
primary_id: AE003507
optional_id: AAF48804.2
Annotation: dblink
Direct database link to AE003507 in database EMBL
database: EMBL
comment: -; Genomic_DNA.
primary_id: AE003507
optional_id: AAF48805.2
Annotation: dblink
Direct database link to AE003507 in database EMBL
database: EMBL
comment: -; Genomic_DNA.
primary_id: AE003507
optional_id: AAN09458.1
Annotation: dblink
Direct database link to AY122145 in database EMBL
database: EMBL
comment: -; mRNA.
primary_id: AY122145
optional_id: AAM52657.1
Annotation: dblink
Direct database link to A40547 in database PIR
database: PIR
primary_id: A40547
optional_id: A40547
Annotation: dblink
Direct database link to B38594 in database PIR
database: PIR
primary_id: B38594
optional_id: B38594
Annotation: dblink
Direct database link to Dm.1717 in database UniGene
database: UniGene
primary_id: Dm.1717
optional_id: -
Annotation: dblink
Direct database link to P45379 in database HSSP
database: HSSP
primary_id: P45379
optional_id: 1J1E
Annotation: dblink
Direct database link to P36188 in database IntAct
database: IntAct
primary_id: P36188
optional_id: -
Annotation: dblink
Direct database link to dme:CG7178-PA in database KEGG
database: KEGG
primary_id: dme:CG7178-PA
optional_id: -
Annotation: dblink
Direct database link to dme:CG7178-PB in database KEGG
database: KEGG
primary_id: dme:CG7178-PB
optional_id: -
Annotation: dblink
Direct database link to dme:CG7178-PC in database KEGG
database: KEGG
primary_id: dme:CG7178-PC
optional_id: -
Annotation: dblink
Direct database link to dme:CG7178-PD in database KEGG
database: KEGG
primary_id: dme:CG7178-PD
optional_id: -
Annotation: dblink
Direct database link to dme:CG7178-PG in database KEGG
database: KEGG
primary_id: dme:CG7178-PG
optional_id: -
Annotation: dblink
Direct database link to FBgn0004028 in database FlyBase
database: FlyBase
primary_id: FBgn0004028
optional_id: wupA
Annotation: dblink
Direct database link to IPR001978 in database InterPro
database: InterPro
primary_id: IPR001978
optional_id: Troponin
Annotation: dblink
Direct database link to PF00992 in database Pfam
database: Pfam
comment: 1
primary_id: PF00992
optional_id: Troponin
..............................................
as you can see, no GO terms above
......................................................
Vs. the actual content of the flat file from for the dblinks from
P19351:
DR EMBL; X54504; CAA38366.1; -; mRNA.
DR EMBL; AY439172;
AAR24583.1; -; Genomic_DNA.
DR EMBL; AY439172; AAR24584.1; -;
Genomic_DNA.
DR EMBL; AY439172; AAR24585.1; -; Genomic_DNA.
DR
EMBL; AY439172; AAR24586.1; -; Genomic_DNA.
DR EMBL; AY439172;
AAR24587.1; -; Genomic_DNA.
DR EMBL; AY665838; AAU09446.1; -; mRNA.
DR EMBL; AE014298; AAF48288.2; -; Genomic_DNA.
DR EMBL; AE014298;
AAF48289.2; -; Genomic_DNA.
DR EMBL; AE014298; AAF48290.1; -;
Genomic_DNA.
DR EMBL; AE014298; AAX52491.1; -; Genomic_DNA.
DR
EMBL; AE014298; AAX52492.1; -; Genomic_DNA.
DR EMBL; AE014298;
AAX52493.1; -; Genomic_DNA.
DR EMBL; AY051989; AAK93413.1; -; mRNA.
DR EMBL; AY070875; AAL48497.1; ALT_SEQ; mRNA.
DR PIR; S13251;
S13251.
DR UniGene; Dm.20472; -.
DR HSSP; P45379; 1J1E.
DR
Ensembl; CG7107; Drosophila melanogaster.
DR KEGG; dme:CG7107-PE; -.
DR KEGG; dme:CG7107-PF; -.
DR KEGG; dme:CG7107-PG; -.
DR
FlyBase; FBgn0004169; up.
DR GO; GO:0007498; P:mesoderm
development; IEP:FlyBase.
......
where the GO term is last entry in dblink section above.
Any help you could provide would be most welcome. Let me know if this is
insufficient information or if you need a working script.
Juan Cristobal Vera
Graduate Student
Department of Biology
Penn State University
208 Mueller Laboratory
University Park, PA 16802
(814)863-2957
More information about the Bioperl-l
mailing list