Bioperl: A greedy consensus builder

Mike Cariaso mcariaso@genelogic.com
Wed, 04 Aug 1999 17:39:40 -0700



To scratch an itch I've built a small object which ISA Bio::UnivAln.
It has a single method 'greedy' which constucts a consensus sequence using a
somewhat different method. The approach does ignores alignment gaps that are
before (or after) the first (last) non-gap of a row. An example may explain it
more clearly.


Sample alignment:

row1  :ATCTTCGCTCGCTCGCTTATATA-ATAAGATAAGATATCGCTCCGCTCGCCTCGCTCCTCAAAGCTCGCTC
row2  :--CTTCGCTCGCTCGCTTATATA-ATAAGATAAGATATCGCTCCGCTCGCCTCGCTCCTC---CCTCGCTC
row3  :--------------------ATAAATAAGATAAGATATCGCTCTGCTCGCCTCGCTCCTC---CCTCGCTC
row4  :---------------------------AGATAAGATATCGCTCCGCTCGCCTCGCTCCTC---CCTCGCTC
row5  :---------------------------AGATAAGATATCGCTCCGCTCGCCTCGCTCCTC---CCTCGCTC
row6  :-------------------------------------------CGCTCGCCTCGCTCCTC---CCTCGCTC

cons  :---------------------------AGATAAGATATCGCTCCGCTCGCCTCGCTCCTC---CCTCGCTC
greedy:ATCTTCGCTCGCTCGCTTATATA-ATAAGATAAGATATCGCTCCGCTCGCCTCGCTCCTC---CCTCGCTC


So for the above alignment the current technique will call the first 20 or so
bases as gaps since that is the most common char. The greedy approach assumes
that this area is outside the known region of those rows, and ignores the gaps
there. This seems useful when working with small partial fragments.



If there is interest I'll be happy (honored, actually) to contribute it to
bio.perl. 
The interface has the threshold param as well as another optional one to specify
the minimum number of rows necessary to do a base call. And if you noticed any
errors in the alignment, its totally bogus data, so the mistake is mine.


-- 
mike cariaso      --------------------     mcariaso@genelogic.com
ph:510-981-3156 -------------------------- fax:510-649-3449
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================