[Bioperl-l] bp_genbank2gff3.pl vs. EMBL2GFF ?

Don Gilbert gilbertd at cricket.bio.indiana.edu
Wed Jan 21 20:35:41 UTC 2009


Dan Bolser <dan.bolser at gmail.com> spotted a problem in bp_genbank2gff3.pl,
and asked whether it was worth the effort to fix/use rather than a simpler
call to Bio::SeqIO methods.  

Here is a patch that should fix the problem you found with bp_genbank2gff3
species->binomial, as well as an update for changes in BioPerl/Annotation use.
As to the question of value, this bp_genbank2gff3 does more parsing of
genbank/embl/swissprot annotations, and tries to put more of these into
GFF v3 hierarchical gene model structures.  If you don't need that level of detail,
the simpler Bio::SeqIO processing is good enough, and less fragile to changes
in your data source and/or BioPerl updates.

- Don Gilbert

BioPerl-1.5.9/scripts/Bio-DB-GFF/genbank2gff3.PLS
#$Id: genbank2gff3.PLS 15088 2008-12-04 02:49:09Z bosborne $;


diff -bwrc scripts/Bio-DB-GFF/genbank2gff3.PLS scripts/Bio-DB-GFF/genbank2gff3.fixed.pl
*** scripts/Bio-DB-GFF/genbank2gff3.PLS	Fri Jan 16 13:33:47 2009
--- scripts/Bio-DB-GFF/genbank2gff3.fixed.pl	Wed Jan 21 15:23:08 2009
***************
*** 671,678 ****
        'product' => 'product',
        'Reference' => 'reference',
        'OntologyTerm' => 'Ontology_term',
!       'comment'  => 'Note',
!       'comment1' => 'Note',
        # various map-type locations
        # gene accession tag is named per source db !??
        # 'Index terms' => keywords ??
--- 671,678 ----
        'product' => 'product',
        'Reference' => 'reference',
        'OntologyTerm' => 'Ontology_term',
!       #? 'comment'  => 'Note',
!       #? 'comment1' => 'Note',
        # various map-type locations
        # gene accession tag is named per source db !??
        # 'Index terms' => keywords ??
***************
*** 684,691 ****
        || $seq->annotation->get_Annotations("update-date")
        || $is_rich ? $seq->get_dates() : ();
    my ($comment)= $seq->annotation->get_Annotations("comment");
!   my ($species)= $seq->annotation->get_Annotations("species") 
!                || ( $seq->can('species') ? $seq->species()->binomial() : undef );
                 
    # update source feature with main GB fields
    $sf->add_tag_value( ID => $seq_name ) unless $sf->has_tag('ID');
--- 684,694 ----
        || $seq->annotation->get_Annotations("update-date")
        || $is_rich ? $seq->get_dates() : ();
    my ($comment)= $seq->annotation->get_Annotations("comment");
!   my ($species)= $seq->annotation->get_Annotations("species");
!   if( ! $species && $seq->can('species') && defined $seq->species() && $seq->species()->can('binomial') )
!     {
!     $species= $seq->species()->binomial();
!     }
                 
    # update source feature with main GB fields
    $sf->add_tag_value( ID => $seq_name ) unless $sf->has_tag('ID');
***************
*** 699,707 ****
    foreach my $atag (sort keys %AnnotTagMap) {
      my $gtag= $AnnotTagMap{$atag}; next unless($gtag);
      my @anno = map{ 
!           ref $_
!          ? split( /[,;] */, $_->value) 
!          : split( /[,;] */, "$_") if($_);
           } $seq->annotation->get_Annotations($atag);  
      foreach(@anno) { $sf->add_tag_value( $gtag => $_ ); }
      }
--- 702,713 ----
    foreach my $atag (sort keys %AnnotTagMap) {
      my $gtag= $AnnotTagMap{$atag}; next unless($gtag);
      my @anno = map{ 
!             # dgg; handle Bio::Annotation::TagTree as get_all_values
! 	    if(ref $_ && $_->can('get_all_values')) { split( /[,;] */, join ";", $_->get_all_values) }
! 	    elsif(ref $_ && $_->can('display_text')) { split( /[,;] */, $_->display_text) }
! 	    elsif(ref $_ && $_->can('value')) { split( /[,;] */, $_->value) }
!             #bad.gets hashes# elsif($_) { split( /[,;] */, "$_") }
!             else { (); }
           } $seq->annotation->get_Annotations($atag);  
      foreach(@anno) { $sf->add_tag_value( $gtag => $_ ); }
      }

...........

-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/



More information about the Bioperl-l mailing list