[Bioperl-l] sequence filtering

Hilmar Lapp lapp@gnf.org
Tue, 8 Oct 2002 17:54:41 -0700 (PDT)


I have to admit what still worries me about the much-desired switch to 
event-based parsing is how easy it is going to be for the user to decide 
what (s)he's interested in and what (s)he's not. If you have some thoughts 
on this, I'd be glad to hear.

Also, the reason I'm spending a couple hours thoughts on this is that 
maybe I or we can come up with something that's not entirely to be trashed 
once we go event-based. Again, given your experience with SearchIO, weigh 
in where you see unresolvable incompatibilities with an event-based 
framework.

What I'm envisioning is something as easy as along the following 
lines:

# we know this
my $seqin = Bio::SeqIO->new(-fh=>\*STDIN, -format=>'genbank');
my $seqout = Bio::SeqIO->new(-fh=>\*STDOUT, -format=>'fasta');

# now configure the builder for parse optimization
my $seqbuilder = $seqin->object_builder();
$seqbuilder->want_none(); # want_all() is the default
$seqbuilder->want_slot('display_id','description','seq');

# this is again as usual
while(my $seq = $seqin->next_seq()) {
	$seqout->write_seq();
}

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp@gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------

On Tue, 8 Oct 2002, Jason Stajich wrote:

> I guess this makes sense - would be very cool if we could add some
> speedups so that if we wanted to just parse the feature table in a genome
> record we could do this in bioperl w/o having to manage the sequence data
> as well.  Again, fixed by doing event based parsing.  Will have to get
> that started at some point.
> 
> But I guess this is a reasonable stopgap measure.
> 
> -jason
> 
> On Tue, 8 Oct 2002, Hilmar Lapp wrote:
> 
> > Apparently (unfortunately) it didn't ring a lot of bells for many people.
> > I'm still looking forward how Biojava does this exactly, even though I've
> > now looked through some of their interfaces.
> >
> > It seems to me what they do is not terribly different from the following
> > interface Bio::Factory::ObjectBuilderI that I propose as a solution. There
> > would be an implementation Bio::Seq::SeqBuilder.
> >
> > =head2 want_slot
> >
> >  Title   : want_slot
> >  Usage   :
> >  Function: Whether or not the object builder wants to populate the
> >            specified slot of the object to be built.
> >
> >            The slot can be specified either as the name of the
> >            respective method, or the initialization parameter that
> >            would be otherwise passed to new() of the object to be
> >            built.
> >
> >  Example :
> >  Returns : TRUE if the object builder wants to populate the slot, and
> >            FALSE otherwise.
> >  Args    : the name of the slot (a string)
> >
> >
> > =cut
> >
> > =head2 add_slot_value
> >
> >  Title   : add_slot_value
> >  Usage   :
> >  Function: Adds one or more values to the specified slot of the object
> >            to be built.
> >
> >            Naming the slot is the same as for want_slot().
> >
> >            The object builder may further filter the content to be
> >            set, or even completely ignore the request.
> >
> >            If this method reports failure, the caller should not add
> >            more values to the same slot. In addition, the caller may
> >            find it appropriate to abandon the object being built
> >            altogether.
> >
> >  Example :
> >  Returns : TRUE on success, and FALSE otherwise
> >  Args    : the name of the slot (a string)
> >            parameters determining the value to be set
> >
> >
> > =cut
> >
> > =head2 want_object
> >
> >  Title   : want_object
> >  Usage   :
> >  Function: Whether or not the object builder is still interested in
> >            continuing with the object being built.
> >
> >            If this method returns FALSE, the caller should not add any
> >            more values to slots, or otherwise risks that the builder
> >            throws an exception. In addition, make_object() is likely
> >            to return undef after this method returned FALSE.
> >
> >  Example :
> >  Returns : TRUE if the object builder wants to continue building
> >            the present object, and FALSE otherwise.
> >  Args    : none
> >
> >
> > =cut
> >
> > =head2 make_object
> >
> >  Title   : make_object
> >  Usage   :
> >  Function: Get the built object.
> >
> >            This method is allowed to return undef if no value has ever
> >            been added since the last call to make_object(), or if
> >            want_object() returned FALSE (or would have returned FALSE)
> >            before calling this method.
> >
> >  Example :
> >  Returns : the object that was built
> >  Args    : none
> >
> >
> > =cut
> >
> > What do people think?
> >
> > 	-hilmar
> >
> 
>