[Bioperl-l] Suggested patches take 2
Murad Nayal
murad@godel.bioc.columbia.edu
Wed, 14 Mar 2001 16:05:56 +0100
This is a multi-part message in MIME format.
--------------90AEBD19D1ECB7DED5073BC2
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Dear all,
few days ago I submitted 2 patches for SeqIO/embl and SeqIO/swiss to
accommodate sequence files trembl.dat and trembl_new.dat and the
variable slicing versions of swissprot and trembl. Unfortunately I
didn't make time then to rerun the test suite after the modifications.
sure enough one of my changes resulted in a truncated accession code. I
fixed the regular expression in question and now the patches pass all
tests on a fresh (today) checkout of bioperl-live. the fixed patches are
attached. very sorry for being careless earlier. I found bioperl's
ability to read the afro mentioned files important to me and I do hope
you'll find it appropriate to add the patches to bioperl.
all the best
Murad
--------------90AEBD19D1ECB7DED5073BC2
Content-Type: text/plain; charset=us-ascii;
name="embl.pm.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="embl.pm.patch"
*** embl.pm.orig Wed Mar 14 15:16:12 2001
--- embl.pm Wed Mar 14 15:47:18 2001
***************
*** 151,160 ****
return undef; # end of file
}
$line =~ /^ID\s+\S+/ || $self->throw("EMBL stream with no ID. Not embl in my book");
! $line =~ /^ID\s+(\S+)\s+\S+\;\s+(\S+)\;\s+(\S+)\;/;
! $name = $1;
! $mol = $2;
! $div = $3;
if(! $name) {
$name = "unknown id";
}
--- 151,166 ----
return undef; # end of file
}
$line =~ /^ID\s+\S+/ || $self->throw("EMBL stream with no ID. Not embl in my book");
!
! if ($line =~ /^ID\s+(\S+)\s+\S+\;\s+(\S+)\;\s+(\S+)\;/) {
! $name = $1;
! $mol = $2;
! $div = $3;
! } elsif($line =~ /^ID\s+(\S+)\s+\S+\;\s+(\S+)\;/ ) {
! $name = $1;
! $mol = $2;
! }
!
if(! $name) {
$name = "unknown id";
}
***************
*** 176,181 ****
--- 182,193 ----
until( !defined $buffer ) {
$_ = $buffer;
+ # Exit if you found FT or SQ before encountering FH
+ if(/^FT \w/ or /^SQ /) {
+ $self->_pushback($buffer);
+ last;
+ }
+
# Exit at start of Feature table
last if /^FH/;
***************
*** 185,201 ****
}
#accession number
! if( /^AC\s+(\S+);?/ ) {
! $acc = $1;
! $acc =~ s/\;//;
! $seq->accession_number($acc);
}
#version number
! if( /^SV\s+(\S+);?/ ) {
! my $sv = $1;
! $sv =~ s/\;//;
! $seq->seq_version($sv);
}
#date (NOTE: takes last date line)
--- 197,209 ----
}
#accession number
! if( /^AC\s+([^\s;]+);?/ ) {
! $seq->accession_number($1);
}
#version number
! if( /^SV\s+([^\s;]+);?/ ) {
! $seq->seq_version($1);
}
#date (NOTE: takes last date line)
--------------90AEBD19D1ECB7DED5073BC2
Content-Type: text/plain; charset=us-ascii;
name="swiss.pm.patch"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
filename="swiss.pm.patch"
*** swiss.pm.org Mon Mar 12 02:22:37 2001
--- swiss.pm Mon Mar 12 02:21:24 2001
***************
*** 150,161 ****
return undef; # end of file
}
! $line =~ /^ID\s+([^\s_]+)_([^\s_]+)\s+([^\s;]+);\s+([^\s;]+);/
! || $self->throw("swissprot stream with no ID. Not swissprot in my book");
! $name = $1."_".$2;
! $seq->primary_id($1);
! $seq->division($2);
! $seq->molecule($4);
# this is important to have the id for display in e.g. FTHelper, otherwise
# you won't know which entry caused an error
$seq->display_id($name);
--- 150,168 ----
return undef; # end of file
}
! if ($line =~ /^ID\s+([^\s_]+)_([^\s_]+)\s+([^\s;]+);\s+([^\s;]+);/) {
! $name = $1."_".$2;
! $seq->primary_id($1);
! $seq->division($2);
! $seq->molecule($4);
! } elsif($line =~ /^ID\s+(\S+)\s+([^\s;]+);\s+([^\s;]+);/ ) {
! $name = $1;
! $seq->primary_id($1);
! $seq->molecule($3);
! } else {
! $self->throw("swissprot stream with no ID. Not swissprot in my book");
! }
!
# this is important to have the id for display in e.g. FTHelper, otherwise
# you won't know which entry caused an error
$seq->display_id($name);
--------------90AEBD19D1ECB7DED5073BC2--