[Bioperl-l] swiss prot
Jason Stajich
jason@chg.mc.duke.edu
Tue, 10 Apr 2001 16:49:27 -0400 (EDT)
This is a TrEMBL entry not Swiss prot. <sigh>. swiss format expects
ID_DIVISION in ID line. There is no real good way to determine this on
the fly in Bio::DB::EMBL since we pass the stream to a SeqIO object.
[sprot] http://www.expasy.org/cgi-bin/get-sprot-raw.pl?P00916
[TrEMBL] http://www.expasy.org/cgi-bin/get-sprot-raw.pl?O39869
Bioperl: here is my fix - please let me know if you think this is
acceptable and I'll submit the fix.
I am assigning division to UNK for the TrEMBL entry even though we could
probably deduce it from OC lines - I don't want to deal with that right
now... (also changed ^\s to \S since they are equivalent).
RCS file: /home/repository/bioperl/bioperl-live/Bio/SeqIO/swiss.pm,v
retrieving revision 1.36
diff -r1.36 swiss.pm
153c153
< $line =~ /^ID\s+([^\s_]+)_([^\s_]+)\s+([^\s;]+);\s+([^\s;]+);/
---
> $line =~ /^ID\s+([\S_]+)(_[\S_]+)?\s+([\S;]+);\s+([\S;]+);/
155c155,161
< $name = $1."_".$2;
---
> if( $2 ) {
> $name = $1."_".$2;
> $seq->division($2);
> } else {
> $name = $1;
> $seq->division('UNK');
> }
157d162
< $seq->division($2);
On Tue, 10 Apr 2001, Xiangyun Wang wrote:
> Hi,
>
> I am using the bio::DB::siwssprot module to retrieve protein sequences
> with their id.
>
> But some proteins (as Q9EPU5) can't be retrieved.
>
> What's the problem here?
>
> Thanks
> Sean
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
Jason Stajich
jason@chg.mc.duke.edu
Center for Human Genetics
Duke University Medical Center
http://www.chg.duke.edu/