[Bioperl-l] 'virtual' seqs
Hilmar Lapp
hlapp@gnf.org
Fri, 28 Jun 2002 10:06:08 -0700
IMHO the connotation of UnknownSequence is overloaded too (refering to a sequence for which there is no functional annotation, or which hasn't been characterized experimentally yet). It appears that VirtualSeq has some consensus ... even though I recall someone (Elia?) saying that's too overloaded either.
I'm not sure the NNNs are the best idea. It makes it hard for subsequent scripts/modules picking up the file, e.g. for upload into a database, to decide whether or not the sequence is meant to be meaningful (and, e.g., stored) or not. E.g., how would you distinguish that the sequence is a dummy and should be treated as absent, as opposed to the sequence having been entirely repeat-masked?
May sound far-fetched. I just generally don't like very much having to interpret input (like what does it mean if the entire sequence consists of Ns).
-hilmar
> -----Original Message-----
> From: Lincoln Stein [mailto:lstein@cshl.org]
> Sent: Friday, June 28, 2002 8:41 AM
> To: Hilmar Lapp; Jason Stajich; Bioperl
> Subject: Re: [Bioperl-l] 'virtual' seqs
>
>
> I think it's important that we be able to perform
> manipulations on feature
> tables and annotations even when the underlying sequence is
> completely
> unavailable (not even a guarantee that you can fetch the
> sequence if you wait
> long enough). Laziness is a great feature, but it's more of an
> implementation issue than something that should be exposed to the API.
>
> As Ewan suggests, it's probably better to return a string of
> N's rather than
> an undef sequence; otherwise lots of programs will break.
> However I think
> that EmptySequence has the wrong connotation. I prefer
> VirtualSequence, or
> possibly UnknownSequence.
>
> Lincoln
>
> On Wednesday 26 June 2002 06:59 pm, Hilmar Lapp wrote:
> > I like LazySeq best -- it means the absence of the sequence
> is not written
> > in stone, fetching is just expensive and can take a while.
> >
> > Also, one should be able to write these sequences to transport the
> > annotation and feature table (without potentially expensive sequence
> > transport, too). In this case a parser's write_seq() method
> asking the
> > object for the sequence should get an empty string instead
> of triggering
> > the actual sequence fetch. At least as an option.
> >
> > I'm wondering how this should be implemented ... not sure
> what's the right
> > thing to do.
> >
> > -hilmar
> >
> > > -----Original Message-----
> > > From: Jason Stajich [mailto:jason@cgt.mc.duke.edu]
> > > Sent: Thursday, June 20, 2002 11:57 AM
> > > To: Bioperl
> > > Subject: [Bioperl-l] 'virtual' seqs
> > >
> > >
> > > We are processing datafiles - bsml,game, (agave?) documents
> > > where it is
> > > possible to just know the length of the sequence but not have
> > > any actual
> > > sequence data associated. I think we should have sequence
> > > objects which
> > > can handle this - they would have a length, but seq()
> would warn and
> > > return undef. We need one that would implement Bio::Seq::RichSeqI
> > > interface - call it VirtualRichSeq ? Perhaps we'll need
> the equivalent
> > > PrimaryVirtualSeq and VirtualSeq?
> > >
> > > Can someone think of a better name, I don't want to confuse
> > > with Ensembl
> > > VirtualXX objects? This would be implemeted in
> Bio::Seq:: namespace.
> > >
> > > -jason
> > >
> > > --
> > > Jason Stajich
> > > Duke University
> > > jason at cgt.mc.duke.edu
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l@bioperl.org
> > > http://bioperl.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
>
> --
> ==============================================================
> ==========
> Lincoln D. Stein Cold Spring Harbor
> Laboratory
> lstein@cshl.org Cold
> Spring Harbor, NY
> ==============================================================
> ==========
>