[Bioperl-l] swiss prot

Heikki Lehvaslaiho heikki@ebi.ac.uk
Wed, 11 Apr 2001 10:30:44 +0100


I think I have solution, but it is not ready, yet.

Rodrigo was teasing me to make emblfetch cgi script into a general
dbfetch and took him seriously. 8-) The script is in testing phase
here at EBI. It offers an easy way to access any local SRS database.
The database specific parameters are kept in an easy to modify hash
(has to be modified within the script for speed). I debugged it using
EMBL, Medline (servs XML!), and Ensembl. It took me exactly one minute
to add SWALL into it.  SWALL is a weekly updated SWISS-PROT + 
SP-TrEMBL +  TrEMBLnew.

In a short while  (week or so depending on how many bugs and feature
changes others want to have before the release) we should be able
point Bio::DB::Swissprot to this script. I am going to distribute the
dbfetch script so that hopefully most SRS maintainers install it and
people could use SRS server closest to them.

	-Heikki 

Jason Stajich wrote:
> 
> This is a TrEMBL entry not Swiss prot.  <sigh>. swiss format expects
> ID_DIVISION in ID line.  There is no real good way to determine this on
> the fly in Bio::DB::EMBL since we pass the stream to a SeqIO object.
> 
> [sprot]  http://www.expasy.org/cgi-bin/get-sprot-raw.pl?P00916
> [TrEMBL] http://www.expasy.org/cgi-bin/get-sprot-raw.pl?O39869
> 
> Bioperl: here is my fix - please let me know if you think this is
> acceptable and I'll submit the fix.
> 
> I am assigning division to UNK for the TrEMBL entry even though we could
> probably deduce it from OC lines - I don't want to deal with that right
> now... (also changed ^\s to \S since they are equivalent).
> 
> RCS file: /home/repository/bioperl/bioperl-live/Bio/SeqIO/swiss.pm,v
> retrieving revision 1.36
> diff -r1.36 swiss.pm
> 153c153
> <    $line =~ /^ID\s+([^\s_]+)_([^\s_]+)\s+([^\s;]+);\s+([^\s;]+);/
> ---
> >    $line =~ /^ID\s+([\S_]+)(_[\S_]+)?\s+([\S;]+);\s+([\S;]+);/
> 155c155,161
> <    $name = $1."_".$2;
> ---
> >    if( $2 ) {
> >        $name = $1."_".$2;
> >        $seq->division($2);
> >    } else {
> >        $name = $1;
> >        $seq->division('UNK');
> >    }
> 157d162
> <    $seq->division($2);
> 
> On Tue, 10 Apr 2001, Xiangyun Wang wrote:
> 
> > Hi,
> >
> > I am using the bio::DB::siwssprot module to retrieve protein sequences
> > with their id.
> >
> > But some proteins (as Q9EPU5) can't be retrieved.
> >
> > What's the problem here?
> >
> > Thanks
> > Sean
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
> >
> 
> Jason Stajich
> jason@chg.mc.duke.edu
> Center for Human Genetics
> Duke University Medical Center
> http://www.chg.duke.edu/
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________