Bioperl: swiss.pm (SwissProt parser)

Hilmar Lapp hlapp@gmx.net
Mon, 08 May 2000 11:51:32 +0200


Hi all,

based on a glance at swiss.pm, the SwissProt parser suffers partly from
similar bugs as the genbank and embl parsers.

First, species initialization does not check that genus is not duplicated. (It
may not be a problem, because the classificiation in SwissProt does not end
with the genus, but I rather suppose it does.)
Second, in the lines
       # Field with no quoted value
       elsif (/^FT\s+\/(\S+)=?(\S+)?/) {
           my $key = $1;
           my $value = $2 if $2;

$value will never be set, because $1 eats up everything until the first space
character, which almost certainly is the end of the line. (The reason is that
'=' is optional and matches \S, and regexps are greedy.) So, you end up with
keys like 'length=1', having '_no_value' as value.

If there are no objections, I'll fix that as well.

Cheers,

	Hilmar

-- 
-----------------------------------------------------------------------
Hilmar Lapp                                      email: hlapp@gmx.net
NFI Vienna, IFD/Bioinformatics                   phone: +43 1 86634 631
A-1235 Vienna                                      fax: +43 1 86634 727
ROI: Bioinformatics (arrays, expression, seqs), Programming, Databases,
     Mountain Biking (hard tail, hard fork: feel the trail)
-----------------------------------------------------------------------
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================