[Bioperl-l] post-processing of seqs

Hilmar Lapp hlapp@gnf.org
Wed, 23 Oct 2002 11:41:42 -0700


I have a use case here in which we need to subject seq objects 
coming off a SeqIO stream to some sort of post-processing, which 
essentially results in another stream of seq objects, where the seq 
objects have been altered or re-created, and the number of seqs may 
or may not be the same as in the original stream.

I know I can easily hard-code this into a script. What I want is 
this to seamlessly integrate into pure SeqIO streams, with 
post-processing 'algorithm' configurable through the command line 
(yes, I'm talking about load_seqdatabase.pl).

My proposed solution would be something that resembles the Biojava 
SeqIO event generator chain:

	Bio::Factory::SeqProcessorI is-a Bio::Factory::SequenceStreamI

	# gets (and possibly sets) the source stream
	# returns Bio::Factory::SequenceStreamI compliant object
	sub source_stream() {}

	# since it is-a SequenceStreamI, having to have next_seq() is implicit
	
This way I could put entire processing 'algorithms' into modules 
implementing Bio::Factory::SeqProcessorI, and chain them arbitrarily 
and easily configurable by just enumerating the modules that I want 
to apply.

Elia/Shawn, does biopipe do something similar already?

BTW the namespace Bio::Factory for this and SequenceStreamI is not 
the luckiest choice I think, but that's a separate story and can be 
solved later if others don't like it either.

	-hilmar
--
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------