[Bioperl-l] Bio::DB::GenBank and complexity

Chris Fields cjfields at uiuc.edu
Tue May 2 18:01:58 UTC 2006


I hate responding to my own post!  Just wanted to add that I'm adding a
warnings for the get_Seq* methods to use the approp. get_Stream* method when
complexity == 0 before returning the Bio::SeqIO object.

CJF

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Tuesday, May 02, 2006 11:20 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::DB::GenBank and complexity
> 
> I ran into some wonkiness with using extra parameters ('seq_start',
> 'seq_stop', 'strand', and 'complexity') with Bio::DB::GenBank that I have
> gone through, fixed, and committed.  I also have added a few tests to DB.t
> for everything (all changes were in Bio::DB::WebDBSeqI and
> Bio::DB::NCBIHelper).  The 'complexity' tag is the strangest, though I did
> manage to get it added as well (with tests).  This is how NCBI defines
> complexity:
> 
> complexity regulates the display:
> 0 - get the whole blob
> 1 - get the bioseq for gi of interest (default in Entrez)
> 2 - get the minimal bioseq-set containing the gi of interest
> 3 - get the minimal nuc-prot containing the gi of interest
> 4 - get the minimal pub-set containing the gi of interest
> 
> Here's my quandary; when setting complexity to '0', you get a glob back
> (the
> main sequence as well as any subsequences, such as CDS); this is in
> essence
> a sequence stream with multiple alphabet types.  So, I now have it set up
> to
> do this:
> 
> my $factory = Bio::DB::GenBank->new(-format => 'fasta',
>                                     -complexity => 0
>                                    );
> 
> my $seqin = $factory->get_Seq_by_acc($acc);
> 
> while (my $seq = $seqin->next_seq) {
>     $seqout->write_seq($seq);
> }
> 
> since I thought returning an array would be horrendously expensive on
> memory, esp. with larger sequences.  Currently this is only set up for
> sequences which are retrieved when complexity is set to '0' so it's a
> pretty
> unique case.  Regardless, I'm worried that, since users expect a Bio::Seq
> object instead of a Bio::SeqIO object here, it will cause a lot of
> confusion
> with the API.  Any suggestions/gripes?
> 
> Chris
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list