[Bioperl-l] Bio/SeqIO/swiss.pm parsing error
Erik
er at xs4all.nl
Fri Nov 3 19:59:47 UTC 2006
Hi all,
I noticed the parsing is borked with newest swisprot files:
UniProt Knowledgebase Release 9 consists of:
UniProtKB/Swiss-Prot Release 51.0 of 31-Oct-2006
UniProtKB/TrEMBL Release 34.0 of 31-Oct-2006
I edited my local copy of Bio/SeqIO/swiss.pm to parse the ID lines
in swissprot/trembl according to the new specification (see
http://expasy.org/sprot/relnotes/sp_news.html).
Basically, the change is as follows:
ID EntryName DataClass; MoleculeType; SequenceLength.
is changed to:
ID EntryName DataClass; SequenceLength.
The change I made was only in the regex capturing the entry name:
method next_seq (Bio/SeqIO/swiss.pm) :
===============
unless( m/
^
ID \s+ #
(\S+) \s+ # $1 entryname
([^\s;]+); \s+ # $2 DataClass
[0-9]+[ ]AA \. # Sequencelength (capture?)
$
/ox )
{
$self->throw("swissprot stream with no ID. Not swissprot in my book");
}
===============
I tested this (=entry parsable and SeqIO created) against several
hundred Swissprot and Trembl entries.
Of course, files with the older format are now broken - it may be better
to leave old and new format, and try both (newest first).
hth,
Erik
More information about the Bioperl-l
mailing list