[BioPython] Adding startswith and endswith methods to the Seq object

Peter peter at maubp.freeserve.co.uk
Mon Apr 13 13:47:04 UTC 2009


Hi all,

I've filed enhancement bug 2809 with a patch to add startswith and
endswith methods to the Seq object,
http://bugzilla.open-bio.org/show_bug.cgi?id=2809

I'm confident there are many possible use cases for this.

The example which prompted me to work on this was taking SeqRecord
objects from sequencing reads (a FASTQ file read in with Bio.SeqIO,
possible with Biopython 1.50 beta or later) where some include a PCR
primer associated prefix/suffix which I want to strip off (by slicing
the SeqRecord).  To do this I need to know if a given SeqRecord's
sequence starts with (or ends with) a given primer sequence (or a
tuple of primer sequences).

e.g. I want to be able to do this:

primer = "TGACCTGAAAAGAC"
crop = len(primer)
#record is a SeqRecord object
if record.seq.startswith(primer) :
   record = record[crop:]

Currently you'd have to turn the Seq into a string to use its
startswith method, which is not as nice:

primer = "TGACCTGAAAAGAC"
crop = len(primer)
#record is a SeqRecord object
if str(record.seq).startswith(primer) :
   record = record[crop:]

or maybe use the find method instead:

primer = "TGACCTGAAAAGAC"
crop = len(primer)
#record is a SeqRecord object
if 0 == record.seq.find(primer) :
   record = record[crop:]

Does this seem like a sensible addition to the Seq object?  It is
consistent with making the Seq object more like a python string.

Peter




More information about the Biopython mailing list