[Bioperl-l] SeqFeature design

Seth Purcell purcell@genome.wi.mit.edu
Fri, 04 Oct 2002 11:32:52 -0400


Jason,

As soon as I read your email I knew what the problem was - my script was 
inadvertently calling Dumper on each feature individually, rather than 
on an array of features, and thus was not printing the '_gsf_seq' => 
$VAR1->{'_gsf_seq'} that I expected to see, but was silently 
dereferencing the same hashref each time. Thanks very much for your 
help, and I'm glad my original understanding of the design was correct.

On an unrelated note, is there any way to easily parse ASN.1 format with 
BioPerl, or is it assumed that all the NCBI resources that are in ASN.1 
are also in genbank format?

Thanks again,
Seth Purcell
Scientific Programmer
Whitehead/MIT Center for Genome Research
Cambridge, MA

Jason Stajich wrote:

> On Fri, 4 Oct 2002, Seth Purcell wrote:
> 
> 
>>Hi -
>>
>>I am using SeqIO::genbank to parse in annotated sequences, and it
>>appears that each SeqFeature object the parser creates contains its own
>>copy of the entire sequence as a PrimarySeq. Obviously, this can't work
>>
> 
> No it has a reference to the original sequence object it does not create
> a separate instance for each feature.
> 
> 
>>for any non-trivial annotated sequence - I've been testing with a 40kb
>>seq and the memory requirements for the features are almost 100 times
>>the sequence length. I read in the Seq documentation that circular
>>
> 
> Obviously this depends on how many features are annotating this 40kb
> sequence?  We've been working on streamlining the system some, but there
> are a number of container objects which get instantiated as well for each
> sequence and feature set, have you checked the memory req on a 100kb
> sequence and after the Bio::SeqIO parser has been destroyed?
> 
> 
>>references are avoided, which is quite understandable in Perl, but I
>>thought it said that each feature had a reference to its sequence, not a
>>copy of its PrimarySeq:
>>
>> > By having this split we avoid a lot of nasty circular references
>> > (sequence features can hold a reference to a sequence without the
>> > sequence holding a reference to the sequence feature).
>>
>>
> I'm unclear where you think that the feature is creating a new copy of the
> Bio::PrimarySeq object?  If you print out the mem location of all the
> features seq object isn't it the same location?
> 
> 
>>I have had little luck so far in finding out whether this is how
>>SeqFeature objects are supposed to be constructed, or if this is rogue
>>behavior on the part of the parser. Could someone please tell me what's
>>going on?
>>
> 
> Features are created and then added to the Bio::Seq object which updates
> the feature's reference to the sequence.
> 
> 
>>Thank you very much,
>>Seth Purcell
>>Scientific Programmer
>>Whitehead/MIT Center for Genome Research
>>Cambridge, MA
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l@bioperl.org
>>http://bioperl.org/mailman/listinfo/bioperl-l
>>
>>
>