[BioPython] Uniprot Parser

Sun Feb 24 17:48:29 UTC 2008

On Sun, Feb 24, 2008 at 5:36 PM, Ruchira Datta <ruchira.datta at gmail.com> wrote:
> I just found another bug, which would be a bit trickier to fix properly.
>
>  This code:
>
>     def database_cross_reference(self, line):
>         # From CLD1_HUMAN, Release 39:
>         # DR   EMBL; [snip]; -. [EMBL / GenBank / DDBJ] [CoDingSequence]
>         # DR   PRODOM [Domain structure / List of seq. sharing at least 1
>  domai
>         # DR   SWISS-2DPAGE; GET REGION ON 2D PAGE.
>         line = line[5:]
>         # Remove the comments at the end of the line
>         i = line.find('[')
>         if i >= 0:
>             line = line[:i]
>         cols = line.rstrip(_CHOMP).split(';')
>         cols = [col.lstrip() for col in cols]
>         self.data.cross_references.append(tuple(cols))
>
>  applied to this line of the TrEMBL record for A2RB21_ASPNG:
>
>  DR   GO; GO:0016277; F:[myelin basic protein]-arginine N-methyltra...;
>  IEA:EC.
>
>  got me this tuple:
>
>  ('GO', 'GO:0016277', 'F:')
>
>  The bracketed term was interpreted as a comment and the whole line was
>  stripped.

That does look tricky... especially if we want to preserve backwards
compatibility.  This "F" cross reference looks like the partial text
for the GO term.  I wonder how common this is? (square brackets in the
cross references themselves).  I can't see the use of "F" mentioned
here: http://www.expasy.org/sprot/userman.html#DR_line

Could you file a bug and add a few more other examples if you find them.

Thanks

Peter