[Biopython-dev] Strange Genbank feature description: how should biopython handle
 this?
    Danny Yoo 
    dyoo at acoma.Stanford.EDU
       
    Wed Aug  7 16:46:45 EDT 2002
    
    
  
Hi everyone,
Ok, I fiddling around with the Genbank parser.  In one of my test cases,
there's one particular entry that's very evil.  It comes from AP000423
(GI:5881673), as gene RPS12:
     gene            join(complement(98562..98793),complement(97999..98024),
                     complement(69611..69724),139856..140087,140625..140650)
                     /gene="rps12"
Here's how Biopython is initializing this feature as:
###
type: gene
location: (98561..140650)
ref: None:None
strand: None
qualifiers:
	Key: gene, Value: ['rps12']
Sub-Features
type: gene
location: (98561..98793)
ref: None:None
strand: -1
qualifiers:
type: gene
location: (97998..98024)
ref: None:None
strand: -1
qualifiers:
type: gene
location: (69610..69724)
ref: None:None
strand: -1
qualifiers:
type: gene
location: (139855..140087)
ref: None:None
strand: None
qualifiers:
type: gene
location: (140624..140650)
ref: None:None
strand: None
qualifiers:
###
The LocationParser itself appears to be doing it's job, as I see that:
###
Function('join', [Function('complement', [AbsoluteLocation(None,
Range(Integer(98562), Integer(98793)))]), Function('complement',
[AbsoluteLocation(None, Range(Integer(97999), Integer(98024)))]),
Function('complement', [AbsoluteLocation(None, Range(Integer(69611),
Integer(69724)))]), AbsoluteLocation(None, Range(Integer(139856),
Integer(140087))), AbsoluteLocation(None, Range(Integer(140625),
Integer(140650)))])
###
Having a strand of 'None' doesn't appear to be right.  I've been staring
at 'Bio.GenBank.__init__.py' for a while, and it appears that the default
value for the strand isn't set unless the self._seq_type is equal to
"DNA".  I don't quite understand all of the code yet, but the following
change appears to fix this particular case:
Index: Bio/GenBank/__init__.py
===================================================================
RCS file: /home/repository/biopython/biopython/Bio/GenBank/__init__.py,v
retrieving revision 1.29
diff -u -r1.29 __init__.py
--- Bio/GenBank/__init__.py	2002/04/16 15:45:26	1.29
+++ Bio/GenBank/__init__.py	2002/08/07 20:43:28
@@ -636,8 +636,9 @@
         # assume positive strand to start with if we have DNA. The
         # complement in the location will change this later.
-        if self._seq_type == "DNA":
-            self._cur_feature.strand = 1
+##         if self._seq_type == "DNA":
+##             self._cur_feature.strand = 1
+        self._cur_feature.strand = 1
     def location(self, content):
         """Parse out location information from the location string.
@@ -735,7 +736,7 @@
             new_sub_feature.ref = cur_feature.ref
             new_sub_feature.ref_db = cur_feature.ref_db
             new_sub_feature.strand = cur_feature.strand
-
+            assert(new_sub_feature.strand in (1, -1)) ## debug
             # set the information for the inner element
             self._set_location_info(inner_element, new_sub_feature)
What's the right way of fixing this problem?  Thank you!
    
    
More information about the Biopython-dev
mailing list