[Bioperl-l] struggling with Bio::FeatureIO and Bio::SeqFeature::Annotated

Marc Logghe Marc.Logghe at devgen.com
Tue Jan 25 04:17:29 EST 2005


Hi Allen,
Thanks for the fixes !
Like you suggested, I got the tag values when using stringification overload, so that is solved (I don't want to commit that myself though, seems too tricky to me ;-).
What is not so nice is that I loose my splitted features:
     gene            join(8311..8422,8852..8887,8940..9090,9142..9233,
                     9721..9848,10296..10714,10835..10934,11584..11706)
                     /gene="R12H7.1"
     CDS             join(8311..8422,8852..8887,8940..9090,9142..9233,
                     9721..9848,10296..10714,10835..10934,11584..11706)


becomes now:

     gene            8311..8422
                     /note="frame=."
                     /gene="R12H7.1"
     CDS             8311..8422

I tried to solve this issue by using the unflattener, but that did not work out quite well neither :-(
My actual question is now: is there a way, using whatever system, to preserve the split feature structure ? That was actually what I was trying to do in the first place: reconstruct the original feature object starting from gff. Any ideas on that ?


Also, do you think it will be possible to convert the Bio::SeqFeature::Annotated features into persistent ones so that these can be stored in BioSQL ? I'll try to test that out today.
Cheers,
Marc




> -----Original Message-----
> From: Allen Day [mailto:allenday at ucla.edu]
> Sent: Tuesday, January 25, 2005 12:55 AM
> To: Marc Logghe
> Cc: Bioperl (E-mail)
> Subject: Re: [Bioperl-l] struggling with Bio::FeatureIO and
> Bio::SeqFeature::Annotated
> 
> 
> Marc,
> 
> The problem was that Bio::SeqIO::FTHelper was making calls 
> assuming it had 
> a Bio::SeqFeature::Generic instance.  I've updated it to make calls 
> compliant with the Bio::SeqFeatureI interface, and the script 
> below now 
> at least runs using "option 1".
> 
> "option 2" will not work, at least for now, because 
> Bio::DB::GenBank is
> creating a SeqIO that holds Bio::SeqFeature::Generic objects, 
> and these
> difficult to deal with because the internal data structures 
> are different
> than a Bio::SeqFeature::Annotated.  I like the technique used below to
> bridge to Bio::FeatureIO via a Bio::Tools::GFF intermediary -- very
> clever.
> 
> You'll also notice that the GenBank-formatted file output by 
> the script 
> doesn't look quite right, the FEATURES section looks kind of like:
> 
> FEATURES             Location/Qualifiers
>      Bio::Annotation::OntologyTerm=HASH(0xa3d93f8)1..20975
>                      
> /source="Bio::Annotation::SimpleValue=HASH(0x9bcdbe0)"
>                      
> /mol_type="Bio::Annotation::SimpleValue=HASH(0xa3dab1c)"
>                      
> /seq_id="Bio::Annotation::SimpleValue=HASH(0xa214de0)"
>                      
> /score="Bio::Annotation::SimpleValue=HASH(0xa3d92cc)"
>                      
> /frame="Bio::Annotation::SimpleValue=HASH(0xa439b04)"
>                      /chad="Bio::Annotation::Comment=HASH(0xa3da9b4)"
>                      
> /note="score=Bio::Annotation::SimpleValue=HASH(0xa3d92cc)"
>                      
> /note="frame=Bio::Annotation::SimpleValue=HASH(0xa439b04)"
>                      
> /db_xref="Bio::Annotation::SimpleValue=HASH(0xa3daaf8)"
>                      
> /clone="Bio::Annotation::SimpleValue=HASH(0xa3dab28)"
>                      
> /strain="Bio::Annotation::SimpleValue=HASH(0xa3dabb8)"
>                      
> /phase="Bio::Annotation::SimpleValue=HASH(0xa3d935c)"
>                      
> /chromosome="Bio::Annotation::SimpleValue=HASH(0xa3dac00)"
>                      
> /type="Bio::Annotation::OntologyTerm=HASH(0xa3d93f8)"
>                      
> /organism="Bio::Annotation::SimpleValue=HASH(0xa3dac48)"
> 
> because Bio::SeqFeautre::Annotated holds annotations as 
> objects pointers
> rather than strings.  We can fix this with a stringification 
> overload, but
> I noticed that the code exists to do this in the Bio::Annotation::*
> classes but is commented out, and I'm not sure why.  Maybe 
> Hilmar can shed
> some light on this.
> 
> -Allen
> 
> 
> 
> On Mon, 24 Jan 2005, Marc Logghe wrote:
> 
> > Hi all,
> > I have some problems with Bio::FeatureIO and 
> Bio::SeqFeature::Annotated. But maybe these modules are not 
> designed for the things I had in mind.
> > My initial goal seemed pretty straightforward. It turned 
> out differently.
> > I have a gff file containing features of bunch of 
> bioentries sitting in BioSQL.
> > I wanted to turn the gff into feature objects, add them to 
> the bioentries, and save them back into the database.
> > As a test I fetch a genbank record, strip the features and 
> convert them to gff. The gff is again converted to features 
> and added to the stripped seq object.
> > The test script looks like this:
> > ========================================================
> > #!/usr/bin/perl
> > use strict;
> > use Bio::SeqIO;
> > use Bio::Tools::GFF;
> > use Bio::FeatureIO;
> > use IO::String;
> > use Bio::DB::GenBank;
> > 
> > use Data::Dumper;
> > 
> > *Bio::SeqFeature::Annotated::all_tags = 
> \*Bio::SeqFeature::Annotated::get_all_tags;
> > 
> > my $gff;
> > my $gffio = IO::String->new($gff);
> > 
> > my $db = Bio::DB::GenBank->new;
> > my $sout = Bio::SeqIO->new(-fh => \*STDOUT, -format => 'genbank');
> > my $seq = $db->get_Seq_by_acc('Z50755');
> > 
> > my @feat = $seq->remove_SeqFeatures;
> > 
> > # writing option 1
> > my $fout = Bio::Tools::GFF->new(-fh => $gffio, -gff_version => 3);
> > # writing option 2
> > my $fout = Bio::FeatureIO->new(-fh => $gffio, -format => 
> 'gff', -version => 3);
> > 
> > $fout->write_feature(@feat);
> > 
> > $gffio = IO::String->new($gff);
> > 
> > my $fin = Bio::FeatureIO->new(-fh => $gffio, -format => 
> 'gff', -version => 3);
> > 
> > while (my $feat = $fin->next_feature)
> > {
> >  $seq->add_SeqFeature($feat);
> > }
> > print Data::Dumper->Dump([$seq],['seq']);
> > 
> > $sout->write_seq($seq);
> > ========================================================
> > 
> > First, I had an issue when writing the features to gff 
> using Bio::FeatureIO (writing option 2):
> > 
> > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG: only Bio::SeqFeature::Annotated objects are writeable
> > STACK: Error::throw
> > STACK: Bio::Root::Root::throw 
> /home/marcl/src/bioperl/bioperl-live/Bio/Root/Root.pm:328
> > STACK: Bio::FeatureIO::gff::write_feature 
> /home/marcl/src/bioperl/bioperl-live/Bio/FeatureIO/gff.pm:259
> > STACK: ./test.pl:25
> > -----------------------------------------------------------
> > 
> > Therefore, I used Bio::Tools::GFF to write (writing option 
> 1). But then, I run into troubles when it comes to dumping 
> the sequence into genbank format:
> > Can't locate object method "all_tags" via package 
> "Bio::SeqFeature::Annotated" at 
> /home/marcl/src/bioperl/bioperl-live/Bio/SeqIO/FTHelper.pm 
> line 212, <GEN1> line 52.
> > 
> > I tried to fix this by adding the line
> > *Bio::SeqFeature::Annotated::all_tags = 
> \*Bio::SeqFeature::Annotated::get_all_tags;
> >  
> > But in vain:
> > Can't locate object method "get_all_tags" via package 
> "Bio::Annotation::Collection" at 
> /home/marcl/src/bioperl/bioperl-live/Bio/SeqFeature/Annotated.
> pm line 547, <GEN1> line 52.
> > 
> > Regards,
> > Marc
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 



More information about the Bioperl-l mailing list