[Bioperl-l] SeqFeature design

Seth Purcell purcell@genome.wi.mit.edu
Fri, 04 Oct 2002 10:26:28 -0400


Hi -

I am using SeqIO::genbank to parse in annotated sequences, and it 
appears that each SeqFeature object the parser creates contains its own 
copy of the entire sequence as a PrimarySeq. Obviously, this can't work 
for any non-trivial annotated sequence - I've been testing with a 40kb 
seq and the memory requirements for the features are almost 100 times 
the sequence length. I read in the Seq documentation that circular 
references are avoided, which is quite understandable in Perl, but I 
thought it said that each feature had a reference to its sequence, not a 
copy of its PrimarySeq:

 > By having this split we avoid a lot of nasty circular references
 > (sequence features can hold a reference to a sequence without the
 > sequence holding a reference to the sequence feature).

I have had little luck so far in finding out whether this is how 
SeqFeature objects are supposed to be constructed, or if this is rogue 
behavior on the part of the parser. Could someone please tell me what's 
going on?

Thank you very much,
Seth Purcell
Scientific Programmer
Whitehead/MIT Center for Genome Research
Cambridge, MA