[Bioperl-l] bug in genbank.pm

Wang, Kai Wang.Kai@mayo.edu
Sat, 16 Feb 2002 17:30:05 -0600

I pointed out this problem about two months ago, but nobody changed it. The
new GenBank file format add a "molecular shape" in the LOCUS line so current
genbank.pm cannot process it.

in the file:

# $Id: genbank.pm,v 1.46 2002/02/14 16:41:22 jason Exp $
    if (($2 eq 'bp') || defined($5)) {
	if ($4 eq 'circular') {
	    ($date) = $line =~ /.*(\d\d-\w\w\w-\d\d\d\d)/;
	} else {
	    $date = $5;
    } else {
	$seq->molecule('PRT') if($2 eq 'aa');
	$date = $4;

The above code was based on the wrong assumption that NCBI will not add
'linear' tag to a record. 
One example is accession number 'NM_003748'. The first line is:

LOCUS       NM_003748               3134 bp    mRNA    linear   PRI

The current genbank.pm cannot recognize 01-NOV-2000.

I think the best way is to use:    $line =~