[Biopython] gff3 problem
Chris Fields
cjfields at illinois.edu
Fri May 20 13:24:30 UTC 2011
On May 20, 2011, at 6:27 AM, Peter Cock wrote:
> On Fri, May 20, 2011 at 12:15 PM, Brad Chapman <chapmanb at 50mail.com> wrote:
>> Peter;
>>
>> [SeqFeature support for not-stranded elements]
>>> So was the consensus that we should reword the Bio.SeqFeature
>>> docstring so say the four valid values for strand are (with GFF3
>>> equivalents in brackets):
>>>
>>> +1 = Forward (+ in GFF3)
>>> -1 = Reverse (- in GFF3)
>>> 0 = Not stranded (. in GFF3)
>>> None = Unknown (? in GFF3)
>>>
>>> And should features on a protein sequence then have strand 0?
>>
>> That sounds great. I can make the corresponding change to the GFF
>> library. Let me know if there are any other roadblocks to
>> integrating that. Thanks much,
>> Brad
>
> I've remembered a corner case, mixed strand features. e.g the
> Arabidopsis thaliana chloroplast complete genome, AP000423
> in EMBL, NC_000932 in GenBank (one of our unit test files).
> e.g. gene with join(complement(69611..69724),139856..140650)
>
> Clearly the child features have well defined strands (+1 and -1).
> The parent feature (the join) is mixed strand. Currently our
> GenBank parser uses None for this. So maybe:
>
> +1 = Forward (+ in GFF3)
> -1 = Reverse (- in GFF3)
> 0 = Not stranded (. in GFF3)
> None = Mixed or unknown (? in GFF3)
>
> Peter
That's essentially what bioperl does for 'split' locations (actually, I think it is just undef, which would translate to '?' for GFF3).
chris
More information about the Biopython
mailing list