[BioPython] Uniprot Parser
Peter
biopython at maubp.freeserve.co.uk
Sun Feb 24 13:06:20 UTC 2008
On Sat, Feb 23, 2008 at 10:44 PM, Ruchira Datta <ruchira.datta at gmail.com> wrote:
> I've been using Bio.SwissProt.SProt to parse this file. The only glitch
> that came up so far is that when some fields span multiple lines (e.g., OS,
> the species field), SProt puts a newline in the field. This is not
> correct--it should be just a blank space. However, this can easily be
> corrected within SProt itself without requiring a forked parser.
I'm guessing you are using the parser to return Record objects, which
are a fairly simple direct mapping of the raw file format - and I can
understand why the newlines were included. If you use the parser to
get SeqRecord objects (which are generic and not tied to the
SwissProt/UniProt format), then the newlines are removed.
> At least two other parsers for this file have been written by people in my
> group, but I have pushed and implemented standardization on the BioPython
> one. Part of the point of BioPython is to have one central repository for
> development and maintenance of things like this, so that hundreds of people
> don't have to spend their time reinventing the wheel. It is much preferable
> that people contribute changes rather than creating a forked version.
>
> --Ruchira
More information about the Biopython
mailing list