[Biojava-l] Genbank file parser error

gang wu gwu at molbio.mgh.harvard.edu
Thu Jan 29 19:28:42 UTC 2009


Thanks Richard. That is exactly the same issue. The latest Subversion 
trunk fixed the problem.

Thanks again for the quick response.

Gang

Richard Holland wrote:
> Gabrielle Doan posted a solution to this a while back and I believe the
> changes have been committed already:
>
> http://www.mail-archive.com/biojava-l@lists.open-bio.org/msg01036.html
>
> How old is the copy of BioJava that you're using? Have you tried
> checking out the trunk from Subversion to see if that works?
>
> cheers,
> Richard
>
> Mark Schreiber wrote:
>   
>> I assume that the downloaded file has the complete sequence in it? Probably
>> worth checking that it has the complete sequence block (all 116366104 bp).
>>
>> - Mark
>>
>> On Thu, Jan 29, 2009 at 12:51 PM, gang wu <gwu at molbio.mgh.harvard.edu>wrote:
>>
>>     
>>> Hi Everyone,
>>>
>>> I have a piece of code to parse Genbank file and retrieve gene sequence and
>>> related information. It works well with sequences such as Arabidopsis
>>> thaliana, C. elegans, Bos taurus. But it failed with Mus musculus chromosome
>>> 2. The contig that the code failed on is the largest one in my test. Contig
>>> NT_039207 has 116366104 bp, but the code shows it's cut to 100000020 bp.
>>> That causes some gene coordinates out of range. Attached is the code. Can
>>> anyone give some suggesttion?
>>>
>>> The Mus musculus Genbank file can be downloaded at :
>>> ftp://ftp.ncbi.nih.gov/genomes/M_musculus/CHR_02/mm_alt_chr2.gbk.gz
>>>
>>> Thanks in advance
>>>
>>> Gang
>>> ==========================================
>>> public class TestMus {
>>>   public void testMusChr2() throws FileNotFoundException,
>>> NoSuchElementException, BioException {
>>>       String fp="/tmp/mm_alt_chr2.gbk";
>>>       System.out.println("File: " + fp);
>>>       BufferedReader gReader = new BufferedReader(new InputStreamReader(new
>>> FileInputStream(new File(fp))));
>>>       Namespace ns = (Namespace) RichObjectFactory.getDefaultNamespace();
>>>       RichSequenceIterator seqI =
>>> RichSequence.IOTools.readGenbankDNA(gReader, ns);
>>>       while (seqI.hasNext()) {
>>>           RichSequence seq = seqI.nextRichSequence();
>>>           String organism = seq.getTaxon().getDisplayName();
>>>           String accession = seq.getAccession();
>>>           String identifier = seq.getIdentifier();
>>>           int taxonID = seq.getTaxon().getNCBITaxID();
>>>           String division = seq.getDivision();
>>>           String seqVersion = "" + seq.getSeqVersion();
>>>           int seqLength = seq.length();
>>>           String description = seq.getDescription();
>>>           System.out.println("Organism: " + organism
>>>                   + "\nAccession: " + accession
>>>                   + "\nIdentifier: " + identifier
>>>                   + "\nTaxonID: " + taxonID
>>>                   + "\nDivision: " + division
>>>                   + "\nSeqVersion: " + seqVersion
>>>                   + "\nLength: " + seqLength);
>>>           System.out.println("2041-2101: " + seq.subStr(2041, 2101));
>>>           for (Iterator i = seq.features(); i.hasNext();) {
>>>               RichFeature f = (RichFeature) i.next();
>>>               int rank = f.getRank();
>>>               String fType = f.getType();
>>>               if (fType.toLowerCase().equals("gene")) {
>>>                   int startPos=f.getLocation().getMin();
>>>                   int endPos=f.getLocation().getMax();
>>>                   int geneLen=endPos-startPos+1;
>>>                   String sequence=seq.subStr(startPos, endPos);
>>>                   String strand = f.getStrand().getToken() + "";
>>>                   Annotation ann = (Annotation) f.getAnnotation();
>>>                   String geneIdentifier ="";
>>>                   if (ann.containsProperty("locus_tag")) {
>>>                       geneIdentifier=ann.getProperty("locus_tag") + "";
>>>                   }
>>>                   else geneIdentifier=ann.getProperty("gene") + "";
>>>
>>>                   String alternativeIdentifiers="";
>>>                   try {
>>>                       alternativeIdentifiers= (String)
>>> ann.getProperty("gene");
>>>
>>>                   } catch(NoSuchElementException e) {}
>>>                   String annotation="";
>>>                   System.out.println(rank + "\t" + geneIdentifier + "\t" +
>>> alternativeIdentifiers + "\t"
>>>                           + startPos + "\t" + endPos + "\t" + geneLen +
>>> "\t" + strand);
>>>               }
>>>           }
>>>       }
>>>   }
>>>   public static void main(String [] args) throws Exception {
>>>      TestMus tm=new TestMus();
>>>       tm.testMusChr2();
>>>   }
>>> }
>>> _______________________________________________
>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>
>>>       
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>
>>     
>
>   




More information about the Biojava-l mailing list