[Bioperl-l] bp_genbank2gff3.pl vs. EMBL2GFF ?

Scott Cain scott at scottcain.net
Thu Jan 22 22:17:54 UTC 2009


Hi Don,

Thanks for this--I committed it today.

Scott


On Wed, Jan 21, 2009 at 3:35 PM, Don Gilbert
<gilbertd at cricket.bio.indiana.edu> wrote:
>
> Dan Bolser <dan.bolser at gmail.com> spotted a problem in bp_genbank2gff3.pl,
> and asked whether it was worth the effort to fix/use rather than a simpler
> call to Bio::SeqIO methods.
>
> Here is a patch that should fix the problem you found with bp_genbank2gff3
> species->binomial, as well as an update for changes in BioPerl/Annotation use.
> As to the question of value, this bp_genbank2gff3 does more parsing of
> genbank/embl/swissprot annotations, and tries to put more of these into
> GFF v3 hierarchical gene model structures.  If you don't need that level of detail,
> the simpler Bio::SeqIO processing is good enough, and less fragile to changes
> in your data source and/or BioPerl updates.
>
> - Don Gilbert
>
> BioPerl-1.5.9/scripts/Bio-DB-GFF/genbank2gff3.PLS
> #$Id: genbank2gff3.PLS 15088 2008-12-04 02:49:09Z bosborne $;
>
>
> diff -bwrc scripts/Bio-DB-GFF/genbank2gff3.PLS scripts/Bio-DB-GFF/genbank2gff3.fixed.pl
> *** scripts/Bio-DB-GFF/genbank2gff3.PLS Fri Jan 16 13:33:47 2009
> --- scripts/Bio-DB-GFF/genbank2gff3.fixed.pl    Wed Jan 21 15:23:08 2009
> ***************
> *** 671,678 ****
>        'product' => 'product',
>        'Reference' => 'reference',
>        'OntologyTerm' => 'Ontology_term',
> !       'comment'  => 'Note',
> !       'comment1' => 'Note',
>        # various map-type locations
>        # gene accession tag is named per source db !??
>        # 'Index terms' => keywords ??
> --- 671,678 ----
>        'product' => 'product',
>        'Reference' => 'reference',
>        'OntologyTerm' => 'Ontology_term',
> !       #? 'comment'  => 'Note',
> !       #? 'comment1' => 'Note',
>        # various map-type locations
>        # gene accession tag is named per source db !??
>        # 'Index terms' => keywords ??
> ***************
> *** 684,691 ****
>        || $seq->annotation->get_Annotations("update-date")
>        || $is_rich ? $seq->get_dates() : ();
>    my ($comment)= $seq->annotation->get_Annotations("comment");
> !   my ($species)= $seq->annotation->get_Annotations("species")
> !                || ( $seq->can('species') ? $seq->species()->binomial() : undef );
>
>    # update source feature with main GB fields
>    $sf->add_tag_value( ID => $seq_name ) unless $sf->has_tag('ID');
> --- 684,694 ----
>        || $seq->annotation->get_Annotations("update-date")
>        || $is_rich ? $seq->get_dates() : ();
>    my ($comment)= $seq->annotation->get_Annotations("comment");
> !   my ($species)= $seq->annotation->get_Annotations("species");
> !   if( ! $species && $seq->can('species') && defined $seq->species() && $seq->species()->can('binomial') )
> !     {
> !     $species= $seq->species()->binomial();
> !     }
>
>    # update source feature with main GB fields
>    $sf->add_tag_value( ID => $seq_name ) unless $sf->has_tag('ID');
> ***************
> *** 699,707 ****
>    foreach my $atag (sort keys %AnnotTagMap) {
>      my $gtag= $AnnotTagMap{$atag}; next unless($gtag);
>      my @anno = map{
> !           ref $_
> !          ? split( /[,;] */, $_->value)
> !          : split( /[,;] */, "$_") if($_);
>           } $seq->annotation->get_Annotations($atag);
>      foreach(@anno) { $sf->add_tag_value( $gtag => $_ ); }
>      }
> --- 702,713 ----
>    foreach my $atag (sort keys %AnnotTagMap) {
>      my $gtag= $AnnotTagMap{$atag}; next unless($gtag);
>      my @anno = map{
> !             # dgg; handle Bio::Annotation::TagTree as get_all_values
> !           if(ref $_ && $_->can('get_all_values')) { split( /[,;] */, join ";", $_->get_all_values) }
> !           elsif(ref $_ && $_->can('display_text')) { split( /[,;] */, $_->display_text) }
> !           elsif(ref $_ && $_->can('value')) { split( /[,;] */, $_->value) }
> !             #bad.gets hashes# elsif($_) { split( /[,;] */, "$_") }
> !             else { (); }
>           } $seq->annotation->get_Annotations($atag);
>      foreach(@anno) { $sf->add_tag_value( $gtag => $_ ); }
>      }
>
> ...........
>
> -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
> -- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research



More information about the Bioperl-l mailing list