[Biojava-l] Re: Biojava-l digest, Vol 1 #334 - 2 msgs

Sarath sarath@decodon.com
Tue, 12 Jun 2001 11:32:36 +0200 (MEST)


hi there 
   I do think its an occasional bug with the genbank files i have come
across it quite a number of times and i even mailed the urls where i found
the recent sequences of Staphylococcus aureus(both strains N315 and
Mu50) completed sequencing on june 1 in the genebank format are making the
same fuss with absence of GI field.You can check the files with the names
BA000017.gbk and BA000018.gbk by browsing to the appropriate strain  at

ftp://ftp.ncbi.nlm.nih.gov/genbank/genomes/Bacteria/   
yours sincerely
sarath chandra

On Mon, 11 Jun 2001, Cox, Greg wrote:

> Here are a couple of links on the Genbank format:
> ftp://ncbi.nlm.nih.gov/genbank/gbrel.txt
> http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html
> Neither indicates that the GI field is optional.  I checked Genbank through
> the Web interface, and that version of AE000783 does have a GI field.  On a
> first pass, I'd guess it's a crossed wire at Genbank.  If this is epidemic,
> it's probably worth changing the parser, but it seems to be more an
> occasional bug.
> 
> Greg
> 
> -----Original Message-----
> From: Sarath [mailto:sarath@decodon.com]
> Sent: Monday, June 11, 2001 2:18 PM
> To: Thomas Down
> Cc: Sarath; biojava-l@biojava.org
> Subject: Re: [Biojava-l] Re: Biojava-l digest, Vol 1 #334 - 2 msgs
> 
> 
> hi thomas
>   It was so nice to hear from you the early response and it did work the
> way you said but i just had to include a set of dummy characters to
> mislead the program but is this  the only way i could manage with
> such files as the files i have suggested as a reference were the newly
> sequenced ones i.e the sequencing of these genomes was completed on 1st
> june  so what have u to say for this ? I dont exactly know the purpose of
> GI in the genbank format but do u think this level of rigidity is
> neccessary for genbankformat reading 
> from sarath 
>   
> 
> On Mon, 11 Jun 2001, Thomas Down wrote:
> 
> > On Mon, Jun 11, 2001 at 06:32:08PM +0200, Sarath wrote:
> > >   Hello everyone
> > >     I have been using the biojava package for around a month and today
> > > surprisingly i have met with a strange circumstance of simple program
> not
> > > able to compute the gc content from a file in the gene bank format.I
> would
> > > be very glad if some body can tell me the bug in the program to find the
> > > gc content from the file (AE000783.gbk)  from the url
> > >   ftp://ncbi.nlm.nih.gov/genbank/genomes/Bacteria/Borrelia_burgdorferi/
> 
> > >     I am pasting the source code here i hope this is not an
> inconvenience
> > > to some people.The code is exactly the same as that given in the
> tutorials
> > > .Probably there is a bug in biojava ...who knows   or should i blame the
> > > genbank people???? 
> > 
> > Bug?  No, never... ;)
> > 
> > I've taken a quick look at this.  The problem is with
> > the VERSION line of the file.  Most Genbank entries look
> > like:
> > 
> >   VERSION     AE000784.1  GI:2690041
> > 
> > The BioJava parser is explicitly expecting to see two tokens
> > folowing the VERSION keyword.
> > 
> > The file you are trying to read has a line like:
> > 
> > 
> >   VERSION     AE000783
> > 
> > The only format documentation I can find for Genbank is at:
> > 
> >   ftp://ncbi.nlm.nih.gov/genbank/docs/
> > 
> > This concentrates on the feature tables, and doesn't seem
> > to give a normative description of the headers.  However,
> > the example given includes the two-token VERSION string,
> > and this seems to be found in the vast majority of entries.
> > 
> > That said, we probably ought to go the `strict in what you
> > produce, tolerant in what you accept' approach.  I'll leave
> > the final decision to people who use the Genbank parser more
> > regularly, though (Greg, are you listening?)
> > 
> > Is there a normative specification for the Genbank header
> > lines anywhere?  If so, maybe it is worth complaining about
> > that entry...
> > 
> >    Thomas.
> > 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>