[Bioperl-l] bug in genbank.pm
Andreas Matern
andreas.matern@lbri.lionbioscience.com
Wed, 20 Feb 2002 17:18:23 -0500
Has this been fixed? Just wondering....
"Wang, Kai" wrote:
>
> I pointed out this problem about two months ago, but nobody changed it. The
> new GenBank file format add a "molecular shape" in the LOCUS line so current
> genbank.pm cannot process it.
>
> in the file:
>
> # $Id: genbank.pm,v 1.46 2002/02/14 16:41:22 jason Exp $
> if (($2 eq 'bp') || defined($5)) {
> if ($4 eq 'circular') {
> $seq->molecule($3);
> $seq->is_circular($4);
> $seq->division($5);
> ($date) = $line =~ /.*(\d\d-\w\w\w-\d\d\d\d)/;
> } else {
> $seq->molecule($3);
> $seq->division($4);
> $date = $5;
> }
> } else {
> $seq->molecule('PRT') if($2 eq 'aa');
> $seq->division($3);
> $date = $4;
> }
>
> The above code was based on the wrong assumption that NCBI will not add
> 'linear' tag to a record.
> One example is accession number 'NM_003748'. The first line is:
>
> LOCUS NM_003748 3134 bp mRNA linear PRI
> 01-NOV-2000
>
> The current genbank.pm cannot recognize 01-NOV-2000.
>
> I think the best way is to use: $line =~
> /^LOCUS\s+(\S+)\s+\S+\s+(bp|aa)\s+(\S+)?\s+(\S+)?\s+(\w\w\w)?\s+(\d\d-\w\w\w
> -\d\d\d\d)?/
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
--
------------------
Andreas Matern
Bioinformatician
LION Bioscience Research, Inc.
141 Portland Street, 10th Floor
Cambridge, MA 02139
andreas.matern@lbri.lionbioscience.com
phone: (617) 245-5483
fax: (617) 245-5499