[Bioperl-l] Re: [Bioperl-guts-l] bioperl commit

Tue Jul 13 08:17:20 EDT 2004

On Jul 12, 2004, at 11:30 PM, Chris Mungall wrote:

> Added ability to parse sequence data in GFF3 - see NOTES section &
> email to bioperl list for details

Great!

> +If you call
> +
> +  $gffio->ignore_sequence_data_toggle(1)
> +
> +prior to parsing the sequence data is ignored; this is useful if you
> +just want the features. It avoids the memory overhead in building and
> +caching sequences

Maybe just $gffio->ignore_sequence(1) would be sufficient?  We tend to 
not add "_toggle" to every attribute; besides which "toggle" has the 
semantics that every time you call it, the value switches.

> +Alternatively, you can call either
> +
> +  $gffio->get_all_seqs()

Again, would $gffio->get_seqs() suffice?

> +  $gffio->seq_id_by_h()

Why have two separate APIs to get the same data?  If you want to 
provide a hashref instead of an array of seqs, use the calling context 
of get_seqs() ...

> +Note that these objects will not have the features attached - you have
> +to do this yourself, OR call
> +
> +  $gffio->features_attached_to_seqs_toggle(1)

Again, $gffio->attach_features(1) seems sufficient ...

> +Note that auto-attaching the features to seqs will incur a higher
> +memory overhead as the features must be cached until the sequence data
> +is found

Which would be the same if you "had to do this yourself".  I think it's 
fair that if a sequence is to have 100 features attached to it, that 
those 100 features will require memory.  There's no *extra* memory 
overhead here, is there?

> +=head1 TODO
> +
> +Make a Bio::SeqIO class specifically for GFF3 with sequence data

This would lead to a much cleaner API, and could now easily be done via 
your improvements to Bio::Tools::GFF

As an aside, instead of reimplementing your own simple FASTA parser, is 
it possible to pass along the Bio::Root::IO object to Bio::SeqIO::fasta 
directly, and let it do the work?

Thanks,

-Aaron