[Bioperl-l] sequence filtering
Hilmar Lapp
lapp@gnf.org
Tue, 8 Oct 2002 17:13:05 -0700 (PDT)
Apparently (unfortunately) it didn't ring a lot of bells for many people.
I'm still looking forward how Biojava does this exactly, even though I've
now looked through some of their interfaces.
It seems to me what they do is not terribly different from the following
interface Bio::Factory::ObjectBuilderI that I propose as a solution. There
would be an implementation Bio::Seq::SeqBuilder.
=head2 want_slot
Title : want_slot
Usage :
Function: Whether or not the object builder wants to populate the
specified slot of the object to be built.
The slot can be specified either as the name of the
respective method, or the initialization parameter that
would be otherwise passed to new() of the object to be
built.
Example :
Returns : TRUE if the object builder wants to populate the slot, and
FALSE otherwise.
Args : the name of the slot (a string)
=cut
=head2 add_slot_value
Title : add_slot_value
Usage :
Function: Adds one or more values to the specified slot of the object
to be built.
Naming the slot is the same as for want_slot().
The object builder may further filter the content to be
set, or even completely ignore the request.
If this method reports failure, the caller should not add
more values to the same slot. In addition, the caller may
find it appropriate to abandon the object being built
altogether.
Example :
Returns : TRUE on success, and FALSE otherwise
Args : the name of the slot (a string)
parameters determining the value to be set
=cut
=head2 want_object
Title : want_object
Usage :
Function: Whether or not the object builder is still interested in
continuing with the object being built.
If this method returns FALSE, the caller should not add any
more values to slots, or otherwise risks that the builder
throws an exception. In addition, make_object() is likely
to return undef after this method returned FALSE.
Example :
Returns : TRUE if the object builder wants to continue building
the present object, and FALSE otherwise.
Args : none
=cut
=head2 make_object
Title : make_object
Usage :
Function: Get the built object.
This method is allowed to return undef if no value has ever
been added since the last call to make_object(), or if
want_object() returned FALSE (or would have returned FALSE)
before calling this method.
Example :
Returns : the object that was built
Args : none
=cut
What do people think?
-hilmar
--
-------------------------------------------------------------
Hilmar Lapp email: lapp@gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
On Tue, 8 Oct 2002, Hilmar Lapp wrote:
> I'm trying to pull the daily full RefSeq cumulative update through bioperl. Before even getting my hands dirty, I realized that this can't work because there are full chromosomes in there, and their sequences will choke perl. OTOH, I'm not interested in those anyway and ideally I can just skip over sequences some property of which match some pattern.
>
> Like always, there is more than one way to make this work, and I'm wondering what could be the (subjectively :) 'best' way in the absence of event-based parsing. Some options that crossed my mind:
>
> a) pass an optional additional parameter to next_seq() which is a closure returning TRUE if the entry is to be parsed and returned and FALSE otherwise. For this option the questions would be, when to call this function (every line, every 'item', before feature table, before sequence, any combination of those?), and what to pass to the closure as argument (a hash map with properties? an instantiated Bio::SeqI object? the current line? the current slot that was parsed and its value? something else?).
>
> b) create a SeqFilterI interface and pass an object implementing it. This is really just a more OO-form of a) and the same kind of questions need to be answered.
>
> c) sending events to an event listener, and skipping over the sequence if any of the listeners returns FALSE (i.e., join by AND). This is again very similar to a) but more flexible but also more heavy-weight (more method calls). Again, similar kinds of questions would need to be answered in order to define SeqParseEventI or a similar interface.
>
> I'd be glad to hear anyone's thoughts on this. Also, I'm sure there are better ways. If you know one, I'd be glad to learn.
>
> My preference goes for simplicity, and so far I don't think a) is that bad, although it does lack some flexibility.
>
> -hilmar
>