[BioPython] Uniprot Parser
    Ruchira Datta 
    ruchira.datta at gmail.com
       
    Sun Feb 24 16:28:33 UTC 2008
    
    
  
On Sun, Feb 24, 2008 at 5:06 AM, Peter <biopython at maubp.freeserve.co.uk>
wrote:
> On Sat, Feb 23, 2008 at 10:44 PM, Ruchira Datta <ruchira.datta at gmail.com>
> wrote:
> > I've been using Bio.SwissProt.SProt to parse this file.  The only glitch
> >  that came up so far is that when some fields span multiple lines (e.g.,
> OS,
> >  the species field), SProt puts a newline in the field.  This is not
> >  correct--it should be just a blank space.  However, this can easily be
> >  corrected within SProt itself without requiring a forked parser.
>
> I'm guessing you are using the parser to return Record objects, which
> are a fairly simple direct mapping of the raw file format - and I can
> understand why the newlines were included.  If you use the parser to
> get SeqRecord objects (which are generic and not tied to the
> SwissProt/UniProt format), then the newlines are removed.
>
Hi Peter,
I had tried SeqRecord first, but it didn't include the references, which I
absolutely need.
While inclusion of newlines may be understandable, it's a bug.  The newline
is stripped
from several other fields by _RecordConsumer, e.g.,
    def reference_number(self, line):
        rn = line[5:].rstrip()
        ...
and it needs to be stripped from this one, instead of
    def organism_species(self, line):
        self.data.organism += line[5:]
The newlines are never significant in any field.
In a couple of weeks I might be able to check out the cvs
version and provide a patch.
--Ruchira
>
> >  At least two other parsers for this file have been written by people in
> my
> >  group, but I have pushed and implemented standardization on the
> BioPython
> >  one.  Part of the point of BioPython is to have one central repository
> for
> >  development and maintenance of things like this, so that hundreds of
> people
> >  don't have to spend their time reinventing the wheel.  It is much
> preferable
> >  that people contribute changes rather than creating a forked version.
> >
> >  --Ruchira
>
    
    
More information about the Biopython
mailing list