[Biopython-dev] Seq object join method

Peter biopython at maubp.freeserve.co.uk
Mon Nov 23 10:44:14 UTC 2009


On Sat, Nov 21, 2009 at 2:31 PM, Sebastian Bassi
<sbassi at clubdelarazon.org> wrote:
> On Fri, Nov 20, 2009 at 1:11 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
>> Now consider Seq("").join([unamb_dna_seq, ambig_dna_seq]),
>> should it follow the addition behaviour (giving a default alphabet)
>> or "do the sensible thing" and preserve the IUPAC alphabet?
>> ....
>> So, what do people think?
>
> From my perspective, I like consistency, so I think that if you want
> to preserve the IUPAC alphabet, you should state the alphabet also in
> the separator sequence.

If you have a list of Seq objects with an IUPAC alphabet, then yes,
you could concatenate them using:

result = Seq("",the_known_IUPAC_alphabet).join(the_list_of_seqs)

But what if you are writing a stand alone function taking Seq arguments
of unknown alphabet? If you want to preserve the alphabet (and I
would), you would be forced to do something nasty like this:

result = Seq("",the_list_of_seqs[0].alphabet).join(the_list_of_seqs)

or simply (as now) avoid using the join method completely, e.g.

result = the_list_of_seqs[0]
for seq in the_list_of_seqs[1:] : result += seq

Neither of these have the clarity of:

result = Seq("").join(the_list_of_seqs)

To me, part of the issue here is that the use of "".join(list_of_strings)
in plain Python has always taken a bit of getting used to. It isn't
very intuitive - the old join function in the string module was
in some ways more natural. Maybe we need to add a Bio.Seq
module join function?

e.g.

def join(words, sep=None) :
    ...

While the Python string module join had the separator defaulting
to the empty string, here we can be explicit that by default there
is no separator sequence by default, therefore no extra alphabet
to worry about.

However, while using a join function lets us avoid the separator
alphabet issue, it isn't object orientated, and does not match the
Python string object very well.

Peter



More information about the Biopython-dev mailing list