[Biojava-l] Re: Biojava-l digest, Vol 1 #334 - 2 msgs

Keith James kdj@sanger.ac.uk
12 Jun 2001 11:15:19 +0100


>>>>> "Sarath" == Sarath  <sarath@decodon.com> writes:

    Sarath> hi there I do think its an occasional bug with the genbank
    Sarath> files i have come across it quite a number of times and i
    Sarath> even mailed the urls where i found the recent sequences of
    Sarath> Staphylococcus aureus(both strains N315 and Mu50)
    Sarath> completed sequencing on june 1 in the genebank format are
    Sarath> making the same fuss with absence of GI field.You can
    Sarath> check the files with the names BA000017.gbk and
    Sarath> BA000018.gbk by browsing to the appropriate strain at

    Sarath> ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Bacteria/

The README file on this ftp site indicates that these files are the
original submission files from the author(s). However, this doesn't
always seems to be the case.

In cases where these are the originals I would not always expect them
conform fully to Genbank format. If they undergo a similar process to
our EMBL submissions then certain details are added by the curators
after they recieve the file (e.g. versioning)

I suggest that the Staph file is a pre-submission original because of
the apparent y2k date problem on the originator's machine ;)

LOCUS       BA000018  2813641 bp    DNA   circular  BCT    21-APR-1901
DEFINITION  Staphylococcus aureus N315, complete genome.
                                                           ^^^^^^^^^^^

I would guess that these files deviate from the strict definition of
Genbank format because they have not been fully processed.

Keith

-- 

-= Keith James - kdj@sanger.ac.uk - http://www.sanger.ac.uk/Users/kdj =-
The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambs CB10 1SA