[Biopython-dev] SeqFeature and FeatureLocation objects (was Bio.GFF)

Peter Cock p.j.a.cock at googlemail.com
Wed May 6 10:32:01 UTC 2009


On Tue, May 5, 2009 at 4:26 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> On Tue, Apr 21, 2009 at 2:55 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
>>> The prime use case to keep in mind is taking a feature location (even
>>> a join), and using this to extract that region of nucleotides from the
>>> parent sequence (i.e. a Seq object or a SeqRecord object, as now both
>>> can be sliced).
>
> I've written code to do this in test_SeqIO_features.py, which cross
> checks the nucleotides pulled out from a GenBank files based on the
> SeqFeature, against what the NCBI provide in FASTA format.  This seems
> to work OK, but has not been tested extensively (e.g. running it on
> drosophila or arabidopsis would be good).

Yep - found a corner case my code can't yet cope with, from the
Arabidopsis thaliana chloroplasts (NC_000932).  This has some
pathological mixed strand locations, like
join(complement(69611..69724),139856..140650) which is for a
trans-spliced ribosomal protein.

> It could make sense to expose this functionality directly in
> Biopython, ...

Given this code is non-trivial to implement, this seems worth doing.

Peter




More information about the Biopython-dev mailing list