[Bioperl-l] struggling with Bio::FeatureIO and
Bio::SeqFeature::Annotated
Allen Day
allenday at ucla.edu
Tue Jan 25 04:45:28 EST 2005
On Tue, 25 Jan 2005, Marc Logghe wrote:
> Hi Allen,
> Thanks for the fixes !
no problem. let me know if you find more stuff like this, i'm trying to
clean up all the calls to SeqFeatureI inheritors to use the interface
methods rather than subclass-specific methods.
> Like you suggested, I got the tag values when using stringification overload, so that is solved (I don't want to commit that myself though, seems too tricky to me ;-).
> What is not so nice is that I loose my splitted features:
> gene join(8311..8422,8852..8887,8940..9090,9142..9233,
> 9721..9848,10296..10714,10835..10934,11584..11706)
> /gene="R12H7.1"
> CDS join(8311..8422,8852..8887,8940..9090,9142..9233,
> 9721..9848,10296..10714,10835..10934,11584..11706)
>
>
> becomes now:
>
> gene 8311..8422
> /note="frame=."
> /gene="R12H7.1"
> CDS 8311..8422
>
> I tried to solve this issue by using the unflattener, but that did not work out quite well neither :-(
> My actual question is now: is there a way, using whatever system, to preserve the split feature structure ? That was actually what I was trying to do in the first place: reconstruct the original feature object starting from gff. Any ideas on that ?
oh. i don't know anything about this. never had to deal with split
locations before. is this concept equivalent to a GFF3 Target attribute?
maybe Scott Cain or Chris Mungall have something to say here. i think
Scott is back from vacation tomorrow.
>
> Also, do you think it will be possible to convert the Bio::SeqFeature::Annotated features into persistent ones so that these can be stored in BioSQL ? I'll try to test that out today.
no idea. my guess is not without substantial effort.
-allen
> Cheers,
> Marc
>
>
>
>
> > -----Original Message-----
> > From: Allen Day [mailto:allenday at ucla.edu]
> > Sent: Tuesday, January 25, 2005 12:55 AM
> > To: Marc Logghe
> > Cc: Bioperl (E-mail)
> > Subject: Re: [Bioperl-l] struggling with Bio::FeatureIO and
> > Bio::SeqFeature::Annotated
> >
> >
> > Marc,
> >
> > The problem was that Bio::SeqIO::FTHelper was making calls
> > assuming it had
> > a Bio::SeqFeature::Generic instance. I've updated it to make calls
> > compliant with the Bio::SeqFeatureI interface, and the script
> > below now
> > at least runs using "option 1".
> >
> > "option 2" will not work, at least for now, because
> > Bio::DB::GenBank is
> > creating a SeqIO that holds Bio::SeqFeature::Generic objects,
> > and these
> > difficult to deal with because the internal data structures
> > are different
> > than a Bio::SeqFeature::Annotated. I like the technique used below to
> > bridge to Bio::FeatureIO via a Bio::Tools::GFF intermediary -- very
> > clever.
> >
> > You'll also notice that the GenBank-formatted file output by
> > the script
> > doesn't look quite right, the FEATURES section looks kind of like:
> >
> > FEATURES Location/Qualifiers
> > Bio::Annotation::OntologyTerm=HASH(0xa3d93f8)1..20975
> >
> > /source="Bio::Annotation::SimpleValue=HASH(0x9bcdbe0)"
> >
> > /mol_type="Bio::Annotation::SimpleValue=HASH(0xa3dab1c)"
> >
> > /seq_id="Bio::Annotation::SimpleValue=HASH(0xa214de0)"
> >
> > /score="Bio::Annotation::SimpleValue=HASH(0xa3d92cc)"
> >
> > /frame="Bio::Annotation::SimpleValue=HASH(0xa439b04)"
> > /chad="Bio::Annotation::Comment=HASH(0xa3da9b4)"
> >
> > /note="score=Bio::Annotation::SimpleValue=HASH(0xa3d92cc)"
> >
> > /note="frame=Bio::Annotation::SimpleValue=HASH(0xa439b04)"
> >
> > /db_xref="Bio::Annotation::SimpleValue=HASH(0xa3daaf8)"
> >
> > /clone="Bio::Annotation::SimpleValue=HASH(0xa3dab28)"
> >
> > /strain="Bio::Annotation::SimpleValue=HASH(0xa3dabb8)"
> >
> > /phase="Bio::Annotation::SimpleValue=HASH(0xa3d935c)"
> >
> > /chromosome="Bio::Annotation::SimpleValue=HASH(0xa3dac00)"
> >
> > /type="Bio::Annotation::OntologyTerm=HASH(0xa3d93f8)"
> >
> > /organism="Bio::Annotation::SimpleValue=HASH(0xa3dac48)"
> >
> > because Bio::SeqFeautre::Annotated holds annotations as
> > objects pointers
> > rather than strings. We can fix this with a stringification
> > overload, but
> > I noticed that the code exists to do this in the Bio::Annotation::*
> > classes but is commented out, and I'm not sure why. Maybe
> > Hilmar can shed
> > some light on this.
> >
> > -Allen
> >
> >
> >
> > On Mon, 24 Jan 2005, Marc Logghe wrote:
> >
> > > Hi all,
> > > I have some problems with Bio::FeatureIO and
> > Bio::SeqFeature::Annotated. But maybe these modules are not
> > designed for the things I had in mind.
> > > My initial goal seemed pretty straightforward. It turned
> > out differently.
> > > I have a gff file containing features of bunch of
> > bioentries sitting in BioSQL.
> > > I wanted to turn the gff into feature objects, add them to
> > the bioentries, and save them back into the database.
> > > As a test I fetch a genbank record, strip the features and
> > convert them to gff. The gff is again converted to features
> > and added to the stripped seq object.
> > > The test script looks like this:
> > > ========================================================
> > > #!/usr/bin/perl
> > > use strict;
> > > use Bio::SeqIO;
> > > use Bio::Tools::GFF;
> > > use Bio::FeatureIO;
> > > use IO::String;
> > > use Bio::DB::GenBank;
> > >
> > > use Data::Dumper;
> > >
> > > *Bio::SeqFeature::Annotated::all_tags =
> > \*Bio::SeqFeature::Annotated::get_all_tags;
> > >
> > > my $gff;
> > > my $gffio = IO::String->new($gff);
> > >
> > > my $db = Bio::DB::GenBank->new;
> > > my $sout = Bio::SeqIO->new(-fh => \*STDOUT, -format => 'genbank');
> > > my $seq = $db->get_Seq_by_acc('Z50755');
> > >
> > > my @feat = $seq->remove_SeqFeatures;
> > >
> > > # writing option 1
> > > my $fout = Bio::Tools::GFF->new(-fh => $gffio, -gff_version => 3);
> > > # writing option 2
> > > my $fout = Bio::FeatureIO->new(-fh => $gffio, -format =>
> > 'gff', -version => 3);
> > >
> > > $fout->write_feature(@feat);
> > >
> > > $gffio = IO::String->new($gff);
> > >
> > > my $fin = Bio::FeatureIO->new(-fh => $gffio, -format =>
> > 'gff', -version => 3);
> > >
> > > while (my $feat = $fin->next_feature)
> > > {
> > > $seq->add_SeqFeature($feat);
> > > }
> > > print Data::Dumper->Dump([$seq],['seq']);
> > >
> > > $sout->write_seq($seq);
> > > ========================================================
> > >
> > > First, I had an issue when writing the features to gff
> > using Bio::FeatureIO (writing option 2):
> > >
> > > ------------- EXCEPTION: Bio::Root::Exception -------------
> > > MSG: only Bio::SeqFeature::Annotated objects are writeable
> > > STACK: Error::throw
> > > STACK: Bio::Root::Root::throw
> > /home/marcl/src/bioperl/bioperl-live/Bio/Root/Root.pm:328
> > > STACK: Bio::FeatureIO::gff::write_feature
> > /home/marcl/src/bioperl/bioperl-live/Bio/FeatureIO/gff.pm:259
> > > STACK: ./test.pl:25
> > > -----------------------------------------------------------
> > >
> > > Therefore, I used Bio::Tools::GFF to write (writing option
> > 1). But then, I run into troubles when it comes to dumping
> > the sequence into genbank format:
> > > Can't locate object method "all_tags" via package
> > "Bio::SeqFeature::Annotated" at
> > /home/marcl/src/bioperl/bioperl-live/Bio/SeqIO/FTHelper.pm
> > line 212, <GEN1> line 52.
> > >
> > > I tried to fix this by adding the line
> > > *Bio::SeqFeature::Annotated::all_tags =
> > \*Bio::SeqFeature::Annotated::get_all_tags;
> > >
> > > But in vain:
> > > Can't locate object method "get_all_tags" via package
> > "Bio::Annotation::Collection" at
> > /home/marcl/src/bioperl/bioperl-live/Bio/SeqFeature/Annotated.
> > pm line 547, <GEN1> line 52.
> > >
> > > Regards,
> > > Marc
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at portal.open-bio.org
> > > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
>
More information about the Bioperl-l
mailing list