[Bioperl-l] genbank locus line format change

Hilmar Lapp lapp@gnf.org
Fri, 27 Apr 2001 11:46:02 -0700


Malcolm Cook wrote:
> 
> I just read the following post in bionet.molbio.genbank and thought of
> Bio::SeqIO::Genbank and Bio::DB::Genbank since the format of the LOCUS line
> is changing.  Please excuse if this is already obvious to the appropriate
> module maintainers.  I am curious as to what impact if any the change will
> have on these modules.
> 
> 

It shouldn't have any effect because the present parser design goes by
regexps that don't use fixed spacer widths or similar things. (This
BTW probably makes it less efficient than it could be.)

> 
> ---------+---------+---------+---------+---------+---------+---------+------
> ---
> 1       10        20        30        40        50        60        70
> 79
> LOCUS       AB000383     5423 bp    DNA   circular  VRL       05-FEB-1999
> 
> Positions  Contents
> ---------  --------
> 01-05      LOCUS
> 06-12      spaces
> 13-21      Locus name
> 22-22      space
> 23-29      Length of sequence, right-justified
> 31-32      bp
> 34-36      Blank, ss- (single-stranded), ds- (double-stranded), or
>            ms- (mixed-stranded)
> 37-42      Blank, DNA, RNA, tRNA (transfer RNA), rRNA (ribosomal RNA),
>            mRNA (messenger RNA), uRNA (small nuclear RNA), snRNA
> 43-52      Blank (implies linear) or circular

Interesting. I'm not sure the parser is prepared for this field being
optionally present. Maybe it hasn't been encountered so far because
no-one is trying to read plasmid sequences. Jason, any comment?

	Hilmar

-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp@gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------