[Biopython-dev] Should MultipleSequenceAlignment support iterator slicing?

Peter Cock p.j.a.cock at googlemail.com
Tue Feb 21 22:50:39 UTC 2012


On Tue, Feb 21, 2012 at 9:44 PM, Fabio Zanini <fabio.zanini at fastmail.fm> wrote:
> Hi all!
>
> I am using the MultipleSequenceAlignment class a lot these days, and
> would find it useful to get subalignments using python iterators. I
> started a discussion on the issue tracker:
>
> https://redmine.open-bio.org/issues/3326
>
> Short version: I would like to do things like
>
> alignment[[4,5,8]]
>
> to get a subalignment with the 5th, 6th, and 9th rows. This syntax is
> not working at present, but can be implemented, for single as well as
> double indices, in a very simple way. For instance, for the single index
> case,
>
> if hasattr(index, '__iter__'):
>    return MultipleSeqAlignment((self._records[i] for i in index),
>                                                         self._alphabet)
>
> Questions? Doubts?
>
> Cheers,
> Fabio

As I said on the bug, there are parallels with numpy arrays
allowing indexing with lists (but not iterators). The problem
with iterator indices for numpy arrays is you may have many
axis - but an iterator can only be looped over once. This
effectively means the iterator would have to be expanded
into a list inside the __getitem__ code.

This isn't so critical with multiple sequence alignments where
we have just two dimensions.

Supporting numpy array list list indexing should cover most
use cases, including things like producing a resampled
alignment for phylogenetic tree bootstrapping where random
columns are selected.

Does that sound useful enough to add (for rows and cols)?
i.e. support row/col index lists - but not iterators?

Peter




More information about the Biopython-dev mailing list