[Bioperl-l] Bug: swiss.pm doesn't parse seq_version

Chris Fields cjfields at uiuc.edu
Thu May 25 19:44:01 UTC 2006


This is due to recent changes in the SwissProt/UniProt format (there
apparently are many other changes besides this).  

>From UniProtKB news (http://ca.expasy.org/sprot/relnotes/sp_news.html) is
this tidbit:
----------------------------------------------------------
 UniProtKB release 7.0 of 07-Feb-2006

    Changes concerning dates and versions numbers (DT lines)

We changed from showing only the dates corresponding to full UniProtKB
releases in the DT lines to displaying the date of the biweekly release at
which an entry is integrated or updated. We dropped the information
concerning the release number and introduced entry and sequence version
numbers in the DT lines.

The new format of the three DT lines is:

DT   DD-MMM-YYYY, integrated into UniProtKB/database_name.
DT   DD-MMM-YYYY, sequence version version_number.
DT   DD-MMM-YYYY, entry version version_number.

Example for UniProtKB/Swiss-Prot:

DT   01-JAN-1998, integrated into UniProtKB/Swiss-Prot.
DT   15-OCT-2001, sequence version 3.
DT   01-APR-2004, entry version 14.

Example for UniProtKB/TrEMBL:

DT   01-FEB-1999, integrated into UniProtKB/TrEMBL.
DT   15-OCT-2000, sequence version 2.
DT   15-DEC-2004, entry version 5.

The sequence version number of an entry is incremented by one when its amino
acid sequence is modified. The entry version number is incremented by one
whenever any data in the flat file representation of the entry is modified.

We retrofitted the entry and sequence version numbers, as well as all dates,
using archived UniProtKB releases.

----------------------------------------------------------

Probably should explain on the swissprot wiki page that the format is in a
state of flux at the moment.  I've added this tidbit to the bug page (#2003)
as well.

Chris


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
> Sent: Monday, May 22, 2006 9:04 PM
> To: Michael Rogoff
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
> 
> We ask that people post patches to the bugzilla as an attachment to
> the bugzilla so we can track what and why the bug was that the patch
> fixes.
> 
> I am not totally sure this patch works because it seems like we need
> to strip out more information now from the DT line if the $date
> actually contains more information than just the date.
> 
> If you would go ahead and create a bug in bugzilla for  this (http://
> bugzilla.open-bio.org) this sort of conversation can be tracked to
> the bug.
> 
> If any of this is unclear please let us know - I though we had put
> some pages up about this sort of thing on the wiki but maybe they
> need to be expanded.
> 
> -jason
> On May 22, 2006, at 9:51 PM, Michael Rogoff wrote:
> 
> > I have a patch that seems to work but I'm not familiar with the
> > proper method to
> > "provide" it.  How do I go about that?
> >
> > The patch is pretty simple, it just parses the sequence version out
> > of the date
> > line where it now hides:
> >
> >          #date
> >          elsif( /^DT\s+(.*)/ ) {
> >            my $date = $1;
> > +
> > +          if ($date =~ /sequence version (\d+)/i) {
> > +              $params{'-seq_version'} ||= $1;
> > +          }
> > +
> >            $date =~ s/\;//;
> >            $date =~ s/\s+$//;
> >            push @{$params{'-dates'}}, $date;
> >          }
> >
> > By the way, what is the difference between Bio::Seq::version and
> > Bio::Seq::RichSeq::seq_version?
> >
> >
> >> -----Original Message-----
> >> From: Jason Stajich [mailto:jason.stajich at duke.edu]
> >> Sent: Monday, May 22, 2006 6:37 PM
> >> To: Michael Rogoff
> >> Cc: bioperl-l at lists.open-bio.org
> >> Subject: Re: [Bioperl-l] Bug: swiss.pm doesn't parse seq_version
> >>
> >>
> >> Sounds like a "missing feature" =)
> >>
> >> AFAIK the module was only written for swissprot files.  It is
> >> possible there have been changes in the format that have not been
> >> tracked to the current code.  We'd certainly appreciate someone
> >> testing it out as versions evolve.  If you submit a bug to bugzilla
> >> with version of bioperl and example files you can track when
> >> a fix is
> >> in.  We of course appreciate anyone's efforts to provide a patch as
> >> most bugs get fixed of late when someone gets "itchy" enough to fix
> >> them.
> >>
> >> -jason
> >>
> >> On May 22, 2006, at 7:56 PM, Michael Rogoff wrote:
> >>
> >>>
> >>> As best as I can tell, using Bio::SeqIO to parse a uniprot file
> >>> ignores the
> >>> sequence version, and calling seq_version() on the resulting
> >>> RichSeq object
> >>> returns undef.
> >>>
> >>> It looks like swiss.pm is trying to parse the version out
> >> of the SV
> >>> line, which
> >>> apparently doesn't exist any more?  The sequence version(s)
> >> are now
> >>> specified as
> >>> part of the Date (DT) lines.
> >>>
> >>> Is this not a bug?  Is swiss.pm not designed to parse uniprot files?
> >>>
> >>> Thanks for any help ...
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> --
> >> Jason Stajich
> >> Duke University
> >> http://www.duke.edu/~jes12
> >>
> >>
> >>
> >
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list