[Bioperl-l] 'virtual' seqs

Hilmar Lapp hlapp@gnf.org
Fri, 28 Jun 2002 10:06:08 -0700


IMHO the connotation of UnknownSequence is overloaded too (refering to a sequence for which there is no functional annotation, or which hasn't been characterized experimentally yet). It appears that VirtualSeq has some consensus ... even though I recall someone (Elia?) saying that's too overloaded either.

I'm not sure the NNNs are the best idea. It makes it hard for subsequent scripts/modules picking up the file, e.g. for upload into a database, to decide whether or not the sequence is meant to be meaningful (and, e.g., stored) or not. E.g., how would you distinguish that the sequence is a dummy and should be treated as absent, as opposed to the sequence having been entirely repeat-masked?

May sound far-fetched. I just generally don't like very much having to interpret input (like what does it mean if the entire sequence consists of Ns).

	-hilmar

> -----Original Message-----
> From: Lincoln Stein [mailto:lstein@cshl.org]
> Sent: Friday, June 28, 2002 8:41 AM
> To: Hilmar Lapp; Jason Stajich; Bioperl
> Subject: Re: [Bioperl-l] 'virtual' seqs
> 
> 
> I think it's important that we be able to perform 
> manipulations on feature 
> tables and annotations even when the underlying sequence is 
> completely 
> unavailable (not even a guarantee that you can fetch the 
> sequence if you wait 
> long enough).  Laziness is a great feature, but it's more of an 
> implementation issue than something that should be exposed to the API.
> 
> As Ewan suggests, it's probably better to return a string of 
> N's rather than 
> an undef sequence; otherwise lots of programs will break.  
> However I think 
> that EmptySequence has the wrong connotation.  I prefer 
> VirtualSequence, or 
> possibly UnknownSequence.
> 
> Lincoln
> 
> On Wednesday 26 June 2002 06:59 pm, Hilmar Lapp wrote:
> > I like LazySeq best -- it means the absence of the sequence 
> is not written
> > in stone, fetching is just expensive and can take a while.
> >
> > Also, one should be able to write these sequences to transport the
> > annotation and feature table (without potentially expensive sequence
> > transport, too). In this case a parser's write_seq() method 
> asking the
> > object for the sequence should get an empty string instead 
> of triggering
> > the actual sequence fetch. At least as an option.
> >
> > I'm wondering how this should be implemented ... not sure 
> what's the right
> > thing to do.
> >
> > 	-hilmar
> >
> > > -----Original Message-----
> > > From: Jason Stajich [mailto:jason@cgt.mc.duke.edu]
> > > Sent: Thursday, June 20, 2002 11:57 AM
> > > To: Bioperl
> > > Subject: [Bioperl-l] 'virtual' seqs
> > >
> > >
> > > We are processing datafiles - bsml,game, (agave?) documents
> > > where it is
> > > possible to just know the length of the sequence but not have
> > > any actual
> > > sequence data associated.  I think we should have sequence
> > > objects which
> > > can handle this - they would have a length, but seq() 
> would warn and
> > > return undef.  We need one that would implement Bio::Seq::RichSeqI
> > > interface - call it VirtualRichSeq ? Perhaps we'll need 
> the equivalent
> > > PrimaryVirtualSeq and VirtualSeq?
> > >
> > > Can someone think of a better name, I don't want to confuse
> > > with Ensembl
> > > VirtualXX objects?  This would be implemeted in 
> Bio::Seq:: namespace.
> > >
> > > -jason
> > >
> > > --
> > > Jason Stajich
> > > Duke University
> > > jason at cgt.mc.duke.edu
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l@bioperl.org
> > > http://bioperl.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
> 
> -- 
> ==============================================================
> ==========
> Lincoln D. Stein                           Cold Spring Harbor 
> Laboratory
> lstein@cshl.org			                  Cold 
> Spring Harbor, NY
> ==============================================================
> ==========
>