[Biojava-dev] EMBL writing bug

Pjm pjm at sanger.ac.uk
Mon Aug 4 12:22:40 EDT 2003


help!

I am using the EMBL reading and writing code but I've come to a halt because of 
a simple bug. Below is the code needed to break it - it joins 2 lines together 
without putting a space in between.

Here is the Biojava created EMBL input file;

ID   FAKE
FT   CDS             complement(1..180)
FT                   /gene="wibble"
FT                   /similarity="AUTO_BLASTP;GP:9366709;Trypanosoma
FT                   brucei;ESAG2 (expression site-associated gene 2) protein,
FT                   probable {Trypanosoma
FT                   brucei};;;id=61.739128%;;E()=2.8e-141;score=1386;;;"
SQ   Sequence 152762 BP; 42967 A; 36188 C; 34183 G; 39424 T; 0 other;
      gatcctttat agtagctaac cccaaacaga cgcgtaggcc tcgggggcac actttgccgg        60
      tgatatcgcc caaacaaatc cgccgttgag ctccatcatc cggttgtgtg ccgcgcctcc       120
      ttcagcatgt tgttctccct ttttcactat ttcagcaccg acggcacgat gttgatcgct       180
//


which when read in and written out straight away gives this;


ID   FAKE
XX
FT   CDS             complement(1..180)
FT                   /gene="wibble"
FT                   /similarity="AUTO_BLASTP;GP:9366709;Trypanosomabrucei;ESAG2
FT                   (expression site-associated gene 2) protein, probable
FT                   {Trypanosoma
FT                   brucei};;;id=61.739128%;;E()=2.8e-141;score=1386;;;"
XX
SQ   Sequence 180 BP; 35 A; 57 C; 40 G; 48 T; 0 other;
      gatcctttat agtagctaac cccaaacaga cgcgtaggcc tcgggggcac actttgccgg        60
      tgatatcgcc caaacaaatc cgccgttgag ctccatcatc cggttgtgtg ccgcgcctcc       120
      ttcagcatgt tgttctccct ttttcactat ttcagcaccg acggcacgat gttgatcgct       180
//


The /similarity line now has a 'Trypanosomabrucei' organism, not a 'Trypanosoma 
brucei'. Here is the code I used for a simple test case;

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileOutputStream;

import java.io.InputStreamReader;
import java.io.IOException;
import java.io.FileNotFoundException;

import org.biojava.bio.seq.io.SeqIOTools;
import org.biojava.bio.seq.SequenceIterator;

public class Break {
     public static void main(String [] argv){
         String origFileName = argv[0];
         String newFileName  = argv[1];

         try
         {
             BufferedReader origReader = new BufferedReader(new
                 InputStreamReader(new FileInputStream(origFileName)));

             SequenceIterator origSeqIt = SeqIOTools.readEmbl(origReader);

             SeqIOTools.writeEmbl(new FileOutputStream(newFileName) , origSeqIt);
         }
         catch(FileNotFoundException fnfe){
             fnfe.printStackTrace();
         }
         catch(IOException ioe){
             ioe.printStackTrace();
         }
     }

}

Can anyone help?

Paul Mooney.



More information about the biojava-dev mailing list