[BioPython] Uniprot Parser
Peter
biopython at maubp.freeserve.co.uk
Sun Feb 24 17:48:29 UTC 2008
On Sun, Feb 24, 2008 at 5:36 PM, Ruchira Datta <ruchira.datta at gmail.com> wrote:
> I just found another bug, which would be a bit trickier to fix properly.
>
> This code:
>
> def database_cross_reference(self, line):
> # From CLD1_HUMAN, Release 39:
> # DR EMBL; [snip]; -. [EMBL / GenBank / DDBJ] [CoDingSequence]
> # DR PRODOM [Domain structure / List of seq. sharing at least 1
> domai
> # DR SWISS-2DPAGE; GET REGION ON 2D PAGE.
> line = line[5:]
> # Remove the comments at the end of the line
> i = line.find('[')
> if i >= 0:
> line = line[:i]
> cols = line.rstrip(_CHOMP).split(';')
> cols = [col.lstrip() for col in cols]
> self.data.cross_references.append(tuple(cols))
>
> applied to this line of the TrEMBL record for A2RB21_ASPNG:
>
> DR GO; GO:0016277; F:[myelin basic protein]-arginine N-methyltra...;
> IEA:EC.
>
> got me this tuple:
>
> ('GO', 'GO:0016277', 'F:')
>
> The bracketed term was interpreted as a comment and the whole line was
> stripped.
That does look tricky... especially if we want to preserve backwards
compatibility. This "F" cross reference looks like the partial text
for the GO term. I wonder how common this is? (square brackets in the
cross references themselves). I can't see the use of "F" mentioned
here: http://www.expasy.org/sprot/userman.html#DR_line
Could you file a bug and add a few more other examples if you find them.
Thanks
Peter
More information about the Biopython
mailing list