[Bioperl-l] 'virtual' seqs

Charles Tilford charles.tilford@bms.com
Mon, 01 Jul 2002 15:40:14 -0400


I'll weigh in on this, since this is an issue that comes up with BSML
documents (which can happily have annotations with no sequence).

For the issue of returning undef vs. a string of characters, why not
parameterize the behavior during creation of the object? That is:

my $vs = Bio::Seq::VirtualSequence->new( -len => 5000, -pad => "X");

...would generate a seq string corresponging to ("X" x 5000). If not
defined, then seq() would be left as undef. This has the advantage of
allowing the user to specify another character (such as "N", or even
"." or "-") as the placeholder character. Disadvantage when the user
sets something REALLY odd, like "7" or "fruit fly", and that causes
complaints later. At some point I guess you have to trust the user to
be moderately kind to the API.

I have no feelings about the name of such an object...

-Charles

Lincoln Stein wrote:
> 
> I think it's important that we be able to perform manipulations on feature
> tables and annotations even when the underlying sequence is completely
> unavailable (not even a guarantee that you can fetch the sequence if you wait
> long enough).  Laziness is a great feature, but it's more of an
> implementation issue than something that should be exposed to the API.
> 
> As Ewan suggests, it's probably better to return a string of N's rather than
> an undef sequence; otherwise lots of programs will break.  However I think
> that EmptySequence has the wrong connotation.  I prefer VirtualSequence, or
> possibly UnknownSequence.
> 
> Lincoln
> 
> On Wednesday 26 June 2002 06:59 pm, Hilmar Lapp wrote:
> > I like LazySeq best -- it means the absence of the sequence is not written
> > in stone, fetching is just expensive and can take a while.
> >
> > Also, one should be able to write these sequences to transport the
> > annotation and feature table (without potentially expensive sequence
> > transport, too). In this case a parser's write_seq() method asking the
> > object for the sequence should get an empty string instead of triggering
> > the actual sequence fetch. At least as an option.
> >
> > I'm wondering how this should be implemented ... not sure what's the right
> > thing to do.
> >
> >       -hilmar
> >
> > > -----Original Message-----
> > > From: Jason Stajich [mailto:jason@cgt.mc.duke.edu]
> > > Sent: Thursday, June 20, 2002 11:57 AM
> > > To: Bioperl
> > > Subject: [Bioperl-l] 'virtual' seqs
> > >
> > >
> > > We are processing datafiles - bsml,game, (agave?) documents
> > > where it is
> > > possible to just know the length of the sequence but not have
> > > any actual
> > > sequence data associated.  I think we should have sequence
> > > objects which
> > > can handle this - they would have a length, but seq() would warn and
> > > return undef.  We need one that would implement Bio::Seq::RichSeqI
> > > interface - call it VirtualRichSeq ? Perhaps we'll need the equivalent
> > > PrimaryVirtualSeq and VirtualSeq?
> > >
> > > Can someone think of a better name, I don't want to confuse
> > > with Ensembl
> > > VirtualXX objects?  This would be implemeted in Bio::Seq:: namespace.
> > >
> > > -jason
> > >
> > > --
> > > Jason Stajich
> > > Duke University
> > > jason at cgt.mc.duke.edu
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l@bioperl.org
> > > http://bioperl.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
> 
> --
> ========================================================================
> Lincoln D. Stein                           Cold Spring Harbor Laboratory
> lstein@cshl.org                                   Cold Spring Harbor, NY
> ========================================================================
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l

-- 
Charles Tilford, Bioinformatics-Applied Genomics
Bristol-Myers Squibb PRI, Hopewell 3A039
P.O. Box 5400, Princeton, NJ 08543-5400, (609) 818-3213
charles.tilford@bms.com