<div>Hello all,<br></div><div><br></div><div>I am new to Bioinformatics, so excuse me if I have got this all wrong.<br></div><div><br></div><div>I
am aligning multiple sequences (ESTs) to a genome (scaffolds fasta
file) using NcbiblastnCommandline module, and for the purposes of my
project I need to cluster the overlapping alignments in order to locate
highly expressed genes. I was suprised not to found any articles online
about a standard (formalised) methodology of this step.<br></div><div><br></div><div>Well,
one can easily locate the scaffolds that appear on multiple alignments
using Biopython's parsers and just go on processing his data. <br></div><div>The thing is that I was wondering if this process would be <b>meaningful</b> to be added to Biopython, for example as a method inside BlastIO package.<br></div><div><br></div><div>If
so, then we should decide on the output format of the new file/info
produced as the result of this process. For example one idea would be (?) to gather all the alignments in one place discarding the source
sequences (queries), and just highlight by some way, e.g introducing a
new score index, the most expressed scaffolds.<br></div><div><br></div><div>Any thoughts on this?<br></div><div><br></div><div>Thank you for your time,<br></div><div>Stelios<br></div>