[Bioperl-l] swiss prot

Jason Stajich jason@chg.mc.duke.edu
Tue, 10 Apr 2001 16:49:27 -0400 (EDT)


This is a TrEMBL entry not Swiss prot.  <sigh>. swiss format expects
ID_DIVISION in ID line.  There is no real good way to determine this on
the fly in Bio::DB::EMBL since we pass the stream to a SeqIO object.

[sprot]  http://www.expasy.org/cgi-bin/get-sprot-raw.pl?P00916
[TrEMBL] http://www.expasy.org/cgi-bin/get-sprot-raw.pl?O39869

Bioperl: here is my fix - please let me know if you think this is
acceptable and I'll submit the fix.

I am assigning division to UNK for the TrEMBL entry even though we could
probably deduce it from OC lines - I don't want to deal with that right
now... (also changed ^\s to \S since they are equivalent).

RCS file: /home/repository/bioperl/bioperl-live/Bio/SeqIO/swiss.pm,v
retrieving revision 1.36
diff -r1.36 swiss.pm
153c153
<    $line =~ /^ID\s+([^\s_]+)_([^\s_]+)\s+([^\s;]+);\s+([^\s;]+);/
---
>    $line =~ /^ID\s+([\S_]+)(_[\S_]+)?\s+([\S;]+);\s+([\S;]+);/
155c155,161
<    $name = $1."_".$2;
---
>    if( $2 ) {
>        $name = $1."_".$2;
>        $seq->division($2);
>    } else {
>        $name = $1;
>        $seq->division('UNK');
>    }
157d162
<    $seq->division($2);


On Tue, 10 Apr 2001, Xiangyun Wang wrote:

> Hi,
>
> I am using the bio::DB::siwssprot module to retrieve protein sequences
> with their id.
>
> But some proteins (as Q9EPU5) can't be retrieved.
>
> What's the problem here?
>
> Thanks
> Sean
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>

Jason Stajich
jason@chg.mc.duke.edu
Center for Human Genetics
Duke University Medical Center
http://www.chg.duke.edu/