[Biojava-l] Re: genbank contig stuff

Greg Cox greg.cox at lionbioscience.com
Mon Jul 7 12:19:34 EDT 2003


We looked at this a while back, and I suspect this isn't a problem BioJava can solve.    

If we treat it as a sequence, one option is try to assemble it.  If BioJava assembles the sequence, it has to know where to get the composing sequences.  This implies some sort of database backing to parse the contig sequences, which seems a bit excessive.  If all you want is the features, we could create a dummy sequence of ambiguous nucleotides of the proper length, and attach the features to that.  At that point though, I think it makes more sense to create a feature holder instead of pretending it's a real sequence.  Which segues into...

The other option is to treat a contig as a new kind of beast, not a sequence.  I don't know what this beast would look like; it has to be a feature holder, probably annotatable, and then what?  Aesthetically I'm not sure this makes sense either, after all, a contig sequence is still a sequence.

The ray of light is that most (all?) contigs are avilable in an expanded form also.  That's been enough for us to avoid grappling with this bull so far.  

Greg

-----Original Message-----
From: biojava-l-bounces at biojava.org
[mailto:biojava-l-bounces at biojava.org]On Behalf Of Matthew Pocock
Sent: Thursday, June 26, 2003 2:58 PM
To: Matthew Pocock
Cc: biojava-l
Subject: [Biojava-l] Re: genbank contig stuff


Sory - I fired that off without thinking much.

I just downloaded the genbank file NT_010783 from the ncbi. Our parsers 
spewed lots of errors about features not being within the range 1..0, 
and after a little poking arround in the code, I found that a zero 
length sequence was being generated. In despiration, I looked at the 
physical genbank file. Instead of sequences, it contains a CONTIG 
section with a single big join() describing how to build it from other 
entries.

Has anybody modified our genbank parser to process entries like this? To 
be honest, I'm not quite sure where to start.

Matthew

_______________________________________________
Biojava-l mailing list  -  Biojava-l at biojava.org
http://biojava.org/mailman/listinfo/biojava-l




More information about the Biojava-l mailing list