[Biopython] getting alignment out of Align.PairwiseAligner
Michiel de Hoon
mjldehoon at yahoo.com
Sun Oct 7 06:53:05 UTC 2018
The pairwise aligner makes use of a trace matrix as described in detail in e.g. Biological Sequence Analysis by Richard Durbin et al.Each alignment corresponds to a path through this trace matrix, consisting of horizontal, vertical, and diagonal segments. Horizontal and vertical segments are gaps; diagonal segments are sequence alignments. The path you got are the vertices in the trace matrix connecting the segments.In your example:
((0, 0), (182, 0), (188, 6), (193, 6)) means
(0, 0) - (182, 0): a gap of 182 amino acids
(182, 0), (188, 6): an alignment of 6 amino acids (amino acids 182-188 in one sequence against amino acids 0-6 in the other sequence)
(188, 6), (193, 6)): a gap of 6 amino acids.
Try a few simple examples and compare to Richard Durbin's book; that should make things clear.
Best,-Michiel
On Saturday, October 6, 2018, 1:01:06 AM GMT+9, Peter Cock <p.j.a.cock at googlemail.com> wrote:
Yes, if you look at the code which makes that string
it does it via the path structure:
https://github.com/biopython/biopython/blob/biopython-172/Bio/Align/__init__.py#L991
What do you want out of the alignment object? Two strings
with gap characters inserted? Something else?
Peter
On Thu, Oct 4, 2018 at 9:48 PM John Berrisford <jmb at ebi.ac.uk> wrote:
>
> Hi
>
>
>
> How do I get the alignment out of Align.PairwiseAligner?
>
>
>
> I have the following code
>
>
>
> aligner = Align.PairwiseAligner()
>
> alignments = aligner.align(self.sequence1, self.sequence2)
> for alignment in sorted(alignments):
>
> logging.debug(alignment)
> logging.debug(alignment.score)
> logging.debug(alignment.target)
> logging.debug(alignment.query)
> logging.debug(alignment.path)
> logging.debug(dir(alignment))
>
>
>
> my example
>
> Query 193 residues long.
>
> Target 6 residues long.
>
>
>
> out of this I can get the
>
> alignment – which appears to be a line separated string of query, alignment, target.
>
> In my example:
>
> MEKLEVGIYTRAREGEIACGDACLVKRVEGVIFLAVGDGIGHGPEAARAAEIAIASMESSMNTGLVNIFQLCHRELRGTRGAVAALCRVDRRQGLWQAAIVGNIHVKILSAKGIITPLATPGILGYNYPHQLLIAKGSYQEGDLFLIHSDGIQEGAVPLALLANYRLTAEELVRLIGEKYGRRDDDVAVIVAR
>
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|XX|XX-----
>
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------RANDOM-----
>
> score – the alignment score (I can also get this with aligner.score)
>
> target – self.sequence2
>
> query – self.sequence1
>
> path – I think this is what I want, but I don’t know how to interpret this – it is something the following in the above example: ((0, 0), (182, 0), (188, 6), (193, 6))
>
> is this documented somewhere?
>
> It looks like 0-181 no alignment, 182 to 187 adds a score of 6. 188 to 193 keeps the score at 6.
>
>
>
>
>
> when I dir(alignment) I only see the above options
>
> ['__class__', '__cmp__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt_
>
> _', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subc
>
> lasshook__', '__weakref__', '_format_psl', 'path', 'query', 'score', 'target']
>
>
>
> what I’m after is the middle row of the alignment (above). Is the only option to split alignment on carriage return?
>
>
>
> Thanks
>
>
>
> John
>
>
>
> --
>
> John Berrisford
>
> PDBe
>
> European Bioinformatics Institute (EMBL-EBI)
>
> European Molecular Biology Laboratory
>
> Wellcome Trust Genome Campus
>
> Hinxton
>
> Cambridge CB10 1SD UK
>
> Tel: +44 1223 492529
>
>
>
> http://www.pdbe.org
>
> http://www.facebook.com/proteindatabank
>
> http://twitter.com/PDBeurope
>
>
>
> _______________________________________________
> Biopython mailing list - Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython
_______________________________________________
Biopython mailing list - Biopython at mailman.open-bio.org
http://mailman.open-bio.org/mailman/listinfo/biopython
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20181007/1197908a/attachment.html>
More information about the Biopython
mailing list