[Bioperl-l] Is it possible to do contig alignments?

Fri Aug 24 04:43:23 UTC 2007

Dear list members,

I would like to "produce" an alignment of a contig, or more exactly 
visualize it in a such a fashion based on the aligned sequences provided 
to be by a sequence assembler:

Consensus: ACGTACGTTG
Sequence1: ACG-AC
Sequence2:  CGTACGT
Sequence3:     AC-TTG

It sounds like a very trivial task but after searching for a long time, 
it seems impossible using the methods BioPerl provides.

Using the Bio::Align classes, it seems like the only way is if the 
sequences have the same aligned length, i.e. like this:

Consensus: ACGTACGTTG
Sequence1: ACG-AC----
Sequence2: -CGTACGT--
Sequence3: ----AC-TTG

It's not very satisfactory if I have to pad the sequences with gaps 
manually. In the context of a phylogenetic alignment, it might make 
sense, but not for contigs.

For assemblies whole sequences are mapped on contigs. Bio::LocatableSeq 
does not help here because it defines locations _within_ the sequence 
(the name LocatableSeq was pretty misleading to me).

I think it's all very strange that contigs have the coordinates of the 
aligned sequences composing them but there is no straightforward way to 
exploit this information.

So what's the bottom line? Am I missing something obvious, an 
out-of-the-box solution? Is it a "missing feature" of BioPerl that is 
planned to be implemented in the future or that should be requested? 
Should I pad my sequences with dashes or spaces after assembly? Or is it 
expected that my aligned reads coming from my assembly be padded with 
lots of gaps at their beginning and end? What's the BioPerl philosophy here?

Thanks for giving me pointers,

Florent