[Bioperl-l] important change for Bio::SeqIO::GenBank
Heikki Lehvaslaiho
heikki@ebi.ac.uk
Wed, 31 Oct 2001 16:49:53 +0000
"Wang, Kai" wrote:
>
> It is the first time I visit this mailing list. I do not know whether
> somebody has pointed out this, but it is a serious problem.
Which version of BioPerl you are using?
I fixed the problem circular molecules two weeks ago in bioperl-live ("from
CVS"). At the same time I noticed the announcement of the LOCUS line change,
but did not do anything about it. See:
http://bioperl.org/pipermail/bioperl-l/2001-October/006397.html
for the thread.
Do you think you could check out the latest code and check your changes to
it.
(see http://cvs.bioperl.org/ for instructions of getting a full copy or
check changes at WEbCVS: 'View CVS' on the bioperl home page.)
Thanks,
-Heikki
> When I browsed source code of Bio::SeqIO in the Bioperl project, I found a
> bug that the code cannot process moleuclar shape of "circular" or "linear",
> which are annotated in a few GenBank files.
>
> An even serious problem is that a new GenBank format will be published soon.
> The "LOCUS" line will be changed a little. According to the release notes of
> GenBank, "the introduction of the new format will occur with Release 127.0
> in December 2001". Obviously the current Bio::SeqIO code cannot process the
> new format.
>
> Because of the above two reasons, I rewrite a small part of the
> Bio::SeqIO::GenBank code and hopefully it will support the new file format
> as well as the old one. A new hash key "molshape" is introduced to represent
> "circular" or "linear". If you think my code is acceptable, please included
> it into new release of BioPerl. GenBank records with new format will be
> released by December 2001 this year!
>
> Below is my code:
>
> # CODE FROM Bio::SeqIO::GenBank
> sub next_seq{
> my ($self,@args) = @_;
> my ($pseq,$c,$name,@desc,$seqc,$mol,$div,$date);
> my $line;
> my @acc = ();
> my $seq = Bio::Seq::RichSeq->new('-verbose' =>$self->verbose());
>
> while(defined($line = $self->_readline())) {
> $line =~ /^LOCUS\s+\S+/ && last;
> }
> return undef if( !defined $line ); # end of file
> # BEGIN MY CODE
> $line =~
> /^LOCUS\s+(\S+)\s+\S+\s+(bp|aa)\s+(\S+)?\s+(linear|circular)?\s+(\w{3})\s+(\
> S+)/ || do {
> $line =~ /^LOCUS\s+(\S+)/ ||
> $self->throw("GenBank stream with no LOCUS. Not
> GenBank in my book. Got $line");
> $self->warn("GenBank record with LOCUS line in unexpected
> format. Attributes from this line other than ID will be missing.")
>
> }
> $name = $1;
> $seq->display_id($name);
> if ($2 eq 'bp') {
> $seq->moltype ('dna');
> $seq->molecule ($3); #examples of $3 are
> 'ss-tRNA', 'ds-DNA', 'DN', 'sno-RNA'
> $seq->molshape ($4); #molshape means either
> 'circular' or 'linear': this term is not included in your original program
>
> } elsif ($2 eq 'aa') {
> $seq->moltype ('protein');
> $seq->molecule ('PRT'); #for protein there are
> no $3 and $4 defined.
> }
> $seq->division ($5);
> defined ($6) and $seq->add_date ($6);
> # END MY CODE
> # BEGIN YOUR CODE
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
--
______ _/ _/_____________________________________________________
_/ _/ http://www.ebi.ac.uk/mutations/
_/ _/ _/ Heikki Lehvaslaiho heikki@ebi.ac.uk
_/_/_/_/_/ EMBL Outstation, European Bioinformatics Institute
_/ _/ _/ Wellcome Trust Genome Campus, Hinxton
_/ _/ _/ Cambs. CB10 1SD, United Kingdom
_/ Phone: +44 (0)1223 494 644 FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________