[Bioperl-l] SeqFeature design
Seth Purcell
purcell@genome.wi.mit.edu
Fri, 04 Oct 2002 10:26:28 -0400
Hi -
I am using SeqIO::genbank to parse in annotated sequences, and it
appears that each SeqFeature object the parser creates contains its own
copy of the entire sequence as a PrimarySeq. Obviously, this can't work
for any non-trivial annotated sequence - I've been testing with a 40kb
seq and the memory requirements for the features are almost 100 times
the sequence length. I read in the Seq documentation that circular
references are avoided, which is quite understandable in Perl, but I
thought it said that each feature had a reference to its sequence, not a
copy of its PrimarySeq:
> By having this split we avoid a lot of nasty circular references
> (sequence features can hold a reference to a sequence without the
> sequence holding a reference to the sequence feature).
I have had little luck so far in finding out whether this is how
SeqFeature objects are supposed to be constructed, or if this is rogue
behavior on the part of the parser. Could someone please tell me what's
going on?
Thank you very much,
Seth Purcell
Scientific Programmer
Whitehead/MIT Center for Genome Research
Cambridge, MA