[BioPython] Uniprot Parser

Sun Feb 24 13:06:20 UTC 2008

On Sat, Feb 23, 2008 at 10:44 PM, Ruchira Datta <ruchira.datta at gmail.com> wrote:
> I've been using Bio.SwissProt.SProt to parse this file.  The only glitch
>  that came up so far is that when some fields span multiple lines (e.g., OS,
>  the species field), SProt puts a newline in the field.  This is not
>  correct--it should be just a blank space.  However, this can easily be
>  corrected within SProt itself without requiring a forked parser.

I'm guessing you are using the parser to return Record objects, which
are a fairly simple direct mapping of the raw file format - and I can
understand why the newlines were included.  If you use the parser to
get SeqRecord objects (which are generic and not tied to the
SwissProt/UniProt format), then the newlines are removed.

>  At least two other parsers for this file have been written by people in my
>  group, but I have pushed and implemented standardization on the BioPython
>  one.  Part of the point of BioPython is to have one central repository for
>  development and maintenance of things like this, so that hundreds of people
>  don't have to spend their time reinventing the wheel.  It is much preferable
>  that people contribute changes rather than creating a forked version.
>
>  --Ruchira