<div>Hello all,<br></div><div><br></div><div>I am new to Bioinformatics, so excuse me if I have got this all wrong.<br></div><div><br></div><div>I

 am aligning multiple sequences (ESTs) to a genome (scaffolds fasta 

file) using NcbiblastnCommandline module, and for the purposes of my 

project I need to cluster the overlapping alignments in order to locate 

highly expressed genes. I was suprised not to found any articles online 

about a standard (formalised) methodology of this step.<br></div><div><br></div><div>Well,

 one can easily locate the scaffolds that appear on multiple alignments 

using Biopython's parsers and just go on processing his data. <br></div><div>The thing is that I was wondering if this process would be <b>meaningful</b> to be added to Biopython, for example as a method inside BlastIO package.<br></div><div><br></div><div>If

 so, then we should decide on the output format of the new file/info 

produced as the result of this process. For example one idea would be (?) to gather all the alignments in one place discarding the source 

sequences (queries), and just highlight by some way, e.g introducing a 

new score index, the most expressed scaffolds.<br></div><div><br></div><div>Any thoughts on this?<br></div><div><br></div><div>Thank you for your time,<br></div><div>Stelios<br></div>