[Bioperl-l] Bio::Seq, search for specific features

Chris Fields cjfields at illinois.edu
Wed Sep 8 23:20:09 UTC 2010


Well, no move has been concretely made yet.  It would be nice to abstract the backend, so one could use possibly any db or memory adaptor.  This is essentially the direction I would like to take the alignment data as well (part of the GSoC project for BioPerl this year was to tackle this very thing).

chris

On Sep 8, 2010, at 3:42 AM, Frank Schwach wrote:

> Hi Jason,
> 
> Yes, I guess that would be the simplest way of doing it - basically just
> doing it the way the docs suggest for getting at a specific feature but
> hiding the grep behind a Bio::Seq method with search parameters. But we
> could also build a hash of feature tags as the Bio::Seq is built so that
> retrieval is more efficient. This could also be used to implement a bin
> indexing scheme for range queries, similar to what Bio::DB::GFF does.
> Is a move to an sqlite backend planend for the near future? 
> 
> Frank
> 
> 
> 
> On Tue, 2010-09-07 at 10:36 -0700, Jason Stajich wrote:
>> And the implementation would just be something like this?
>> 
>> my @features = grep { $_->has_tag('id') && ($_->get_tag_values('id'))[0] 
>> eq 'my_gene' } $seq->get_SeqFeatures();
>> 
>> I think any implementation would be if we moved from the in-memory 
>> arrays & hash-based system to a sqlite db on the back-end for how 
>> Sequence and Feature objects are stored.
>> This would be a somewhat slower but wouldn't have performance/memory 
>> problems we get for sequences with many annotations.
>> 
>> -jason
>> Frank Schwach wrote, On 9/7/10 5:09 AM:
>>> I am working a lot with feature-rich Bio::Seq objects these days and
>>> thought that it would be really nice if I could do something like:
>>> 
>>> my @features = $bio_seq_obj->get_SeqFeatures(-by_id =>  'my_gene');
>>> 
>>> instead of having to grep for the feature every time.
>>> There could then be 'by_tag' and 'by_region' options as well.
>>> 
>>> According to the Bio::Seq docs, something like this seems to be planned
>>> at some stage. I would be willing to contribute to this feature if I can
>>> and if this isn't already being implemented by somebody else.
>>> Does anybody know the state of this feature?
>>> 
>>> Frank
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
> 
> 
> 
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome Research 
> Limited, a charity registered in England with number 1021457 and a 
> company registered in England with number 2742969, whose registered 
> office is 215 Euston Road, London, NW1 2BE. 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list