[Bioperl-l] SeqIO::swiss->write_seq
Karger, Amir
AKarger@CuraGen.com
Thu, 28 Jun 2001 11:00:44 -0400
Heikki made the mistake of encouraging me.
Because I'm needing to parse Swiss-prot files (thanks for saving me a lot of
parsing work!) I'm using Bio::Seq::swiss.pm. I noticed that the output from
write_seq isn't quite the same as the input to next_seq. I don't know
whether that's a design goal or not. But I think at least some of the fixes
are trivial. I did a next_seq and a write_seq on the bioperl's t/swiss.dat.
(I should mention that 0.7.1 had a significantly smaller diff than 0.7.)
Here it is:
9c9
< GN GC1QBP OR HABP1 OR SF2P32 OR C1QBP
---
> GN GC1QBP OR HABP1 OR SF2P32 OR C1QBP.
Looks to me like a one-character bug-fix! (Ah. I just saw in CVS that this
was fixed.)
11,12c11,13
< OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
Mammalia;
< OC Eutheria; Primates; Catarrhini; Hominidae; Homo.
Maybe _write_line_swissprot_regex should be called with length 78 or 79
instead of 80?
---
> OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
> OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.
> OX NCBI_TaxID=9606;
OX isn't in the most recent (May 2000!) manual, so I can understand why
bioperl wouldn't handle it.
18,20c19,21
< RA Leffers H.
< RT "Cloning and expression of a cDNA covering the complete coding region
of
< RT the P32 subunit of human pre-mRNA splicing factor SF2."
---
> RA Leffers H.;
> RT "Cloning and expression of a cDNA covering the complete coding region
> RT of the P32 subunit of human pre-mRNA splicing factor SF2.";
Semicolons are removed in next_seq (or actually in
_read_swissprot_References). But they aren't reapplied in write_seq.
[several more RA/RT differences snipped]
60,62c61,63
< DR EMBL; L04636; AAA16315.1.
< DR EMBL; M69039; AAA73055.1.
< DR EMBL; X75913; CAA53512.1.
---
> DR EMBL; L04636; AAA16315.1; -.
> DR EMBL; M69039; AAA73055.1; -.
> DR EMBL; X75913; CAA53512.1; -.
This one baffled me for a while, since the - should be in the comment field.
I finally decided to copy some of the code from next_seq into a command-line
perl interpreter, and at that point realized that line 260 of swiss.pm says
$comment = s///
instead of
$comment =~ s///
Aha!
69,70c70,71
< FT CHAIN 74 282 COMPLEMENT COMPONENT 1, Q
SUBCOMPONENTBINDING
< FT PROTEIN.
---
> FT CHAIN 74 282 COMPLEMENT COMPONENT 1, Q SUBCOMPONENT
> FT BINDING PROTEIN.
I think this means line 929 of swiss.pm should read:
$desc .= " $1"; # replace \n with a space
Amir Karger
Curagen Corporation