[Bioperl-pipeline] some transformer stuff

Shawn Hoon shawnh at fugu-sg.org
Thu Feb 27 09:52:58 EST 2003


Hi all,
	my head is cloudy with jet lag but I had playing around with transformers
on the plane and had some small proposals. I am doing some multiple alignments
for phylogenetic tree building. And as part of that process, I wanna do some editing
of the alignments on the fly (say removing gaps) that will otherwise screw up
the distance calculations. This is all file based at the moment so it will not affect
the genome annotation stuff.

So what I do is this:

Clustalw -> alignment file -> Alignment Editor Transformer -> Phylo stuff.
	    |<--          IOHandler                       ->|
So to read an alignment file I'm using AlignIO. We all know that IOHandlers do not 
handle iterator type calls. So what I know have is another transfomer that sits
between the iohandler and the thre alignment editor transformer.


AlignIO IOHandler -> Iterator Transformer -> Alignment Editor Transformer -> ...

So the Iterator module basically takes in a IO-type object, figures out whether its SeqIO
or AlignIO etc and basically returns the array of objects. Not the most efficient but
it works for small number of inputs per file.

So that works. The issue is whether we want to store full file paths in the input table
for each input name, which is messy if you are copying file inputs and you want to figure out
the base file name. Alternatively, what I think might be better is to add a column file_dir in the stream
adaptor table. So if its present, and the inputs are files, it will concatenate it.
The current way of handling files is at the analysis level where we assume that files
are simply passed to the analysis and the file_dir is an analysis parameter. However for
this case here, we need to put the file into objects and for the transforming.
in anycase, interested to hear your views. 

cheers,

shawn





-- 
********************************
* Shawn Hoon
* http://www.fugu-sg.org/~shawnh
********************************



More information about the bioperl-pipeline mailing list