[Bioperl-l] Is it possible to do contig alignments?
Florent Angly
florent.angly at gmail.com
Fri Aug 24 04:43:23 UTC 2007
Dear list members,
I would like to "produce" an alignment of a contig, or more exactly
visualize it in a such a fashion based on the aligned sequences provided
to be by a sequence assembler:
Consensus: ACGTACGTTG
Sequence1: ACG-AC
Sequence2: CGTACGT
Sequence3: AC-TTG
It sounds like a very trivial task but after searching for a long time,
it seems impossible using the methods BioPerl provides.
Using the Bio::Align classes, it seems like the only way is if the
sequences have the same aligned length, i.e. like this:
Consensus: ACGTACGTTG
Sequence1: ACG-AC----
Sequence2: -CGTACGT--
Sequence3: ----AC-TTG
It's not very satisfactory if I have to pad the sequences with gaps
manually. In the context of a phylogenetic alignment, it might make
sense, but not for contigs.
For assemblies whole sequences are mapped on contigs. Bio::LocatableSeq
does not help here because it defines locations _within_ the sequence
(the name LocatableSeq was pretty misleading to me).
I think it's all very strange that contigs have the coordinates of the
aligned sequences composing them but there is no straightforward way to
exploit this information.
So what's the bottom line? Am I missing something obvious, an
out-of-the-box solution? Is it a "missing feature" of BioPerl that is
planned to be implemented in the future or that should be requested?
Should I pad my sequences with dashes or spaces after assembly? Or is it
expected that my aligned reads coming from my assembly be padded with
lots of gaps at their beginning and end? What's the BioPerl philosophy here?
Thanks for giving me pointers,
Florent
More information about the Bioperl-l
mailing list