[Bioperl-l] GO terms not present in Swiss annotation object-more details

Juan Cristobal Vera jcv128 at psu.edu
Tue Nov 21 20:53:07 UTC 2006





Hi,
   I'm writing a simple application to extract various fields from  
 swissprot objects and I can't access the GO terms found in  
 dblink part of the swiss format flat files.  I'm
 not a  professional programmer and I can't figure out why this is  
 occuring.  All the other dblink keys are being  
 generated as far as I can tell (e.g. embl, pfam, etc).  The GO  
 terms are just skipped over and it's driving me crazy.  Not sure if  
 this is a bug or a deliberate strategy I'm unfamiliar with.  

 I'm using ActivePerl 5.8.8 build 819 on a windows machine (sorry) and
the bioperl 1.4 PPM3 package.  Perhaps this is too old?
Here's
part of my code (mostly derived from bioperl docs):
.........................
#cut

$seqInObj = $indexObj->get_Seq_by_id($line);  #get sequence and create seq object

#cut

if (defined $seqInObj->annotation){
      $annotObj = $seqInObj->annotation; #create annotation object
      foreach $key ($annotObj->get_all_annotation_keys){
        @values =$annotObj->get_Annotations($key);
        foreach $value (@values){
          if (lc($key) eq "dblink"){
            print $outfh "Annotation: $key\n";   
            print $outfh $value->as_text,"\n";
            $dbhash_ref = $value->hash_tree;
             for $dbKey (keys %{$dbhash_ref}) {
               print $outfh $dbKey,":",$dbhash_ref->{$dbKey},"\n";    #none of these prints produce GO terms
               }
             }
         }
      }

}
.........................
My program searches an indexed database on
my machine, creates the objects, and prints out relevant annotations.

Here are some of the accessions I used for testing:

P19351  TNNT_DROME

P36188  TNNI_DROME

P11147  HSP7D_DROME

..........................................

the relevant output looks something like this (for debugging) for
P19351:

......................................................................

Direct database link to X58188 in database EMBL

database: EMBL

comment:  -; Genomic_DNA.

primary_id: X58188

optional_id: CAA41171.1

Annotation: dblink

Direct database link to X59376 in database EMBL

database: EMBL

comment:  -; mRNA.

primary_id: X59376

optional_id: CAA42020.1

Annotation: dblink

Direct database link to AE003507 in database EMBL

database: EMBL

comment:  -; Genomic_DNA.

primary_id: AE003507

optional_id: AAF48802.2

Annotation: dblink

Direct database link to AE003507 in database EMBL

database: EMBL

comment:  -; Genomic_DNA.

primary_id: AE003507

optional_id: AAF48803.2

Annotation: dblink

Direct database link to AE003507 in database EMBL

database: EMBL

comment:  -; Genomic_DNA.

primary_id: AE003507

optional_id: AAF48804.2

Annotation: dblink

Direct database link to AE003507 in database EMBL

database: EMBL

comment:  -; Genomic_DNA.

primary_id: AE003507

optional_id: AAF48805.2

Annotation: dblink

Direct database link to AE003507 in database EMBL

database: EMBL

comment:  -; Genomic_DNA.

primary_id: AE003507

optional_id: AAN09458.1

Annotation: dblink

Direct database link to AY122145 in database EMBL

database: EMBL

comment:  -; mRNA.

primary_id: AY122145

optional_id: AAM52657.1

Annotation: dblink

Direct database link to A40547 in database PIR

database: PIR

primary_id: A40547

optional_id: A40547

Annotation: dblink

Direct database link to B38594 in database PIR

database: PIR

primary_id: B38594

optional_id: B38594

Annotation: dblink

Direct database link to Dm.1717 in database UniGene

database: UniGene

primary_id: Dm.1717

optional_id: -

Annotation: dblink

Direct database link to P45379 in database HSSP

database: HSSP

primary_id: P45379

optional_id: 1J1E

Annotation: dblink

Direct database link to P36188 in database IntAct

database: IntAct

primary_id: P36188

optional_id: -

Annotation: dblink

Direct database link to dme:CG7178-PA in database KEGG

database: KEGG

primary_id: dme:CG7178-PA

optional_id: -

Annotation: dblink

Direct database link to dme:CG7178-PB in database KEGG

database: KEGG

primary_id: dme:CG7178-PB

optional_id: -

Annotation: dblink

Direct database link to dme:CG7178-PC in database KEGG

database: KEGG

primary_id: dme:CG7178-PC

optional_id: -

Annotation: dblink

Direct database link to dme:CG7178-PD in database KEGG

database: KEGG

primary_id: dme:CG7178-PD

optional_id: -

Annotation: dblink

Direct database link to dme:CG7178-PG in database KEGG

database: KEGG

primary_id: dme:CG7178-PG

optional_id: -

Annotation: dblink

Direct database link to FBgn0004028 in database FlyBase

database: FlyBase

primary_id: FBgn0004028

optional_id: wupA

Annotation: dblink

Direct database link to IPR001978 in database InterPro

database: InterPro

primary_id: IPR001978

optional_id: Troponin

Annotation: dblink

Direct database link to PF00992 in database Pfam

database: Pfam

comment: 1

primary_id: PF00992

optional_id: Troponin

..............................................

as you can see, no GO terms above

......................................................

Vs. the actual content of the flat file from for the dblinks from
P19351:

DR   EMBL; X54504; CAA38366.1; -; mRNA.
DR   EMBL; AY439172;
AAR24583.1; -; Genomic_DNA.
DR   EMBL; AY439172; AAR24584.1; -;
Genomic_DNA.
DR   EMBL; AY439172; AAR24585.1; -; Genomic_DNA.
DR  
EMBL; AY439172; AAR24586.1; -; Genomic_DNA.
DR   EMBL; AY439172;
AAR24587.1; -; Genomic_DNA.
DR   EMBL; AY665838; AAU09446.1; -; mRNA.
DR   EMBL; AE014298; AAF48288.2; -; Genomic_DNA.
DR   EMBL; AE014298;
AAF48289.2; -; Genomic_DNA.
DR   EMBL; AE014298; AAF48290.1; -;
Genomic_DNA.
DR   EMBL; AE014298; AAX52491.1; -; Genomic_DNA.
DR  
EMBL; AE014298; AAX52492.1; -; Genomic_DNA.
DR   EMBL; AE014298;
AAX52493.1; -; Genomic_DNA.
DR   EMBL; AY051989; AAK93413.1; -; mRNA.
DR   EMBL; AY070875; AAL48497.1; ALT_SEQ; mRNA.
DR   PIR; S13251;
S13251.
DR   UniGene; Dm.20472; -.
DR   HSSP; P45379; 1J1E.
DR  
Ensembl; CG7107; Drosophila melanogaster.
DR   KEGG; dme:CG7107-PE; -.
DR   KEGG; dme:CG7107-PF; -.
DR   KEGG; dme:CG7107-PG; -.
DR  
FlyBase; FBgn0004169; up.
DR   GO; GO:0007498; P:mesoderm
development; IEP:FlyBase.
......

where the GO term is last entry in dblink section above.

Any help you could provide would be most welcome.  Let me know if this is
insufficient information or if you need a working script.





Juan Cristobal Vera

Graduate Student

Department of Biology

Penn State University

208 Mueller Laboratory

University Park, PA 16802

(814)863-2957




More information about the Bioperl-l mailing list