[Bioperl-l] SeqFeature design

Jason Stajich jason@cgt.mc.duke.edu
Fri, 4 Oct 2002 10:27:44 -0400 (EDT)


On Fri, 4 Oct 2002, Seth Purcell wrote:

> Hi -
>
> I am using SeqIO::genbank to parse in annotated sequences, and it
> appears that each SeqFeature object the parser creates contains its own
> copy of the entire sequence as a PrimarySeq. Obviously, this can't work

No it has a reference to the original sequence object it does not create
a separate instance for each feature.

> for any non-trivial annotated sequence - I've been testing with a 40kb
> seq and the memory requirements for the features are almost 100 times
> the sequence length. I read in the Seq documentation that circular

Obviously this depends on how many features are annotating this 40kb
sequence?  We've been working on streamlining the system some, but there
are a number of container objects which get instantiated as well for each
sequence and feature set, have you checked the memory req on a 100kb
sequence and after the Bio::SeqIO parser has been destroyed?

> references are avoided, which is quite understandable in Perl, but I
> thought it said that each feature had a reference to its sequence, not a
> copy of its PrimarySeq:
>
>  > By having this split we avoid a lot of nasty circular references
>  > (sequence features can hold a reference to a sequence without the
>  > sequence holding a reference to the sequence feature).
>
I'm unclear where you think that the feature is creating a new copy of the
Bio::PrimarySeq object?  If you print out the mem location of all the
features seq object isn't it the same location?

> I have had little luck so far in finding out whether this is how
> SeqFeature objects are supposed to be constructed, or if this is rogue
> behavior on the part of the parser. Could someone please tell me what's
> going on?

Features are created and then added to the Bio::Seq object which updates
the feature's reference to the sequence.

>
> Thank you very much,
> Seth Purcell
> Scientific Programmer
> Whitehead/MIT Center for Genome Research
> Cambridge, MA
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>

-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu