[Biopython-dev] Strange Genbank feature description: how should biopython handle this?

Brad Chapman chapmanb at arches.uga.edu
Mon Aug 12 11:56:45 EDT 2002


Hi Danny;
Sorry to be so slow in getting back with you. Evil post-conference
mounds of work piled up on me.

> Ok, I fiddling around with the Genbank parser.  In one of my test cases,
> there's one particular entry that's very evil.  It comes from AP000423
> (GI:5881673), as gene RPS12:
[...]
> Having a strand of 'None' doesn't appear to be right.  

Yes, actually I ran into this problem right before the conference with
Jeremy and thought I had committed the fix (ugh, forgot. Bad Brad!
Bad!). The problem is that the following code:

> -        if self._seq_type == "DNA":
> -            self._cur_feature.strand = 1

only will set the strand if we are dealing with a DNA molecule. The
problem is that your _seq_type looks like:

DNA     circular

which mucks things up badly. I've changed this code so it looks like:

if self._seq_type.find("DNA") >= 0:

so that we only require DNA to be in the name. I think this will fix
this and the changes are in CVS. Please let me know if this doesn't
help.

> +            assert(new_sub_feature.strand in (1, -1)) ## debug

Things aren't actually quite as easy to debug as this. The strand in
Biopython can take on 4 values:

None --> protein and RNA, which don't have any strand information
1 --> DNA on the plus strand
-1 --> DNA on the minus strand
0 --> DNA on both strands

Hopefully this explains things and fixes your problem. If not, feel free
to drop another e-mail!
Brad



More information about the Biopython-dev mailing list