[Bioperl-l] genbank locus line format change
Hilmar Lapp
lapp@gnf.org
Fri, 27 Apr 2001 11:46:02 -0700
Malcolm Cook wrote:
>
> I just read the following post in bionet.molbio.genbank and thought of
> Bio::SeqIO::Genbank and Bio::DB::Genbank since the format of the LOCUS line
> is changing. Please excuse if this is already obvious to the appropriate
> module maintainers. I am curious as to what impact if any the change will
> have on these modules.
>
>
It shouldn't have any effect because the present parser design goes by
regexps that don't use fixed spacer widths or similar things. (This
BTW probably makes it less efficient than it could be.)
>
> ---------+---------+---------+---------+---------+---------+---------+------
> ---
> 1 10 20 30 40 50 60 70
> 79
> LOCUS AB000383 5423 bp DNA circular VRL 05-FEB-1999
>
> Positions Contents
> --------- --------
> 01-05 LOCUS
> 06-12 spaces
> 13-21 Locus name
> 22-22 space
> 23-29 Length of sequence, right-justified
> 31-32 bp
> 34-36 Blank, ss- (single-stranded), ds- (double-stranded), or
> ms- (mixed-stranded)
> 37-42 Blank, DNA, RNA, tRNA (transfer RNA), rRNA (ribosomal RNA),
> mRNA (messenger RNA), uRNA (small nuclear RNA), snRNA
> 43-52 Blank (implies linear) or circular
Interesting. I'm not sure the parser is prepared for this field being
optionally present. Maybe it hasn't been encountered so far because
no-one is trying to read plasmid sequences. Jason, any comment?
Hilmar
--
-------------------------------------------------------------
Hilmar Lapp email: lapp@gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------