[Biopython-dev] [Bug 2552] New: Adding alignments

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Mon Jul 28 09:48:56 UTC 2008


http://bugzilla.open-bio.org/show_bug.cgi?id=2552

           Summary: Adding alignments
           Product: Biopython
           Version: Not Applicable
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk


This is related to the very broad alignment bug 1944.

Given two alignments, it can make sense to talk about adding them together. 
However we can either add by row, or by column.

e.g. Consider this alignment, a

DNAAlphabet() alignment with 3 rows and 14 columns
ACGATCAGCTAGCT Alpha
CCGATCAGCTAGCT Beta
ACGATGAGCTAGCT Gamma

Doing a+a by column would give:

DNAAlphabet() alignment with 3 rows and 28 columns
ACGATCAGCTAGCTACGATCAGCTAGCT Alpha
CCGATCAGCTAGCTCCGATCAGCTAGCT Beta
ACGATGAGCTAGCTACGATGAGCTAGCT Gamma

This sort of operation is often done to combined alignments from multiple genes
(after first sorting the rows to ensure the species names are in the same
order).  To implement this would ideally require the ability to add SeqRecord
objects together, doing something sensible with the annotation and in
particular the identifies.

Doing a+a by row would give:

DNAAlphabet() alignment with 6 rows and 14 columns
ACGATCAGCTAGCT Alpha
CCGATCAGCTAGCT Beta
ACGATGAGCTAGCT Gamma
ACGATCAGCTAGCT Alpha
CCGATCAGCTAGCT Beta
ACGATGAGCTAGCT Gamma

This particular example, a+a, is perhaps unrealistic due to the repeated
identifiers, but I imagine there are some real use cases for this operation.

More generally, suppose we have two alignments a and b.  Treating each
alignment as a list of SeqRecord objects, you might expect:

a.extend(b) -> addition by row
a+b -> addition by row

However, I would suggest for alignment objects:

a.extend(b) -> addition by row, requires sequence all be same length (same
number of columns)

a+b -> addition by column, requires same number of sequences (rows)


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list