[Bioperl-l] SeqFeature design
Jason Stajich
jason@cgt.mc.duke.edu
Fri, 4 Oct 2002 10:27:44 -0400 (EDT)
On Fri, 4 Oct 2002, Seth Purcell wrote:
> Hi -
>
> I am using SeqIO::genbank to parse in annotated sequences, and it
> appears that each SeqFeature object the parser creates contains its own
> copy of the entire sequence as a PrimarySeq. Obviously, this can't work
No it has a reference to the original sequence object it does not create
a separate instance for each feature.
> for any non-trivial annotated sequence - I've been testing with a 40kb
> seq and the memory requirements for the features are almost 100 times
> the sequence length. I read in the Seq documentation that circular
Obviously this depends on how many features are annotating this 40kb
sequence? We've been working on streamlining the system some, but there
are a number of container objects which get instantiated as well for each
sequence and feature set, have you checked the memory req on a 100kb
sequence and after the Bio::SeqIO parser has been destroyed?
> references are avoided, which is quite understandable in Perl, but I
> thought it said that each feature had a reference to its sequence, not a
> copy of its PrimarySeq:
>
> > By having this split we avoid a lot of nasty circular references
> > (sequence features can hold a reference to a sequence without the
> > sequence holding a reference to the sequence feature).
>
I'm unclear where you think that the feature is creating a new copy of the
Bio::PrimarySeq object? If you print out the mem location of all the
features seq object isn't it the same location?
> I have had little luck so far in finding out whether this is how
> SeqFeature objects are supposed to be constructed, or if this is rogue
> behavior on the part of the parser. Could someone please tell me what's
> going on?
Features are created and then added to the Bio::Seq object which updates
the feature's reference to the sequence.
>
> Thank you very much,
> Seth Purcell
> Scientific Programmer
> Whitehead/MIT Center for Genome Research
> Cambridge, MA
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu