[Biojava-dev] EMBL writing bug
Pjm
pjm at sanger.ac.uk
Mon Aug 4 12:22:40 EDT 2003
help!
I am using the EMBL reading and writing code but I've come to a halt because of
a simple bug. Below is the code needed to break it - it joins 2 lines together
without putting a space in between.
Here is the Biojava created EMBL input file;
ID FAKE
FT CDS complement(1..180)
FT /gene="wibble"
FT /similarity="AUTO_BLASTP;GP:9366709;Trypanosoma
FT brucei;ESAG2 (expression site-associated gene 2) protein,
FT probable {Trypanosoma
FT brucei};;;id=61.739128%;;E()=2.8e-141;score=1386;;;"
SQ Sequence 152762 BP; 42967 A; 36188 C; 34183 G; 39424 T; 0 other;
gatcctttat agtagctaac cccaaacaga cgcgtaggcc tcgggggcac actttgccgg 60
tgatatcgcc caaacaaatc cgccgttgag ctccatcatc cggttgtgtg ccgcgcctcc 120
ttcagcatgt tgttctccct ttttcactat ttcagcaccg acggcacgat gttgatcgct 180
//
which when read in and written out straight away gives this;
ID FAKE
XX
FT CDS complement(1..180)
FT /gene="wibble"
FT /similarity="AUTO_BLASTP;GP:9366709;Trypanosomabrucei;ESAG2
FT (expression site-associated gene 2) protein, probable
FT {Trypanosoma
FT brucei};;;id=61.739128%;;E()=2.8e-141;score=1386;;;"
XX
SQ Sequence 180 BP; 35 A; 57 C; 40 G; 48 T; 0 other;
gatcctttat agtagctaac cccaaacaga cgcgtaggcc tcgggggcac actttgccgg 60
tgatatcgcc caaacaaatc cgccgttgag ctccatcatc cggttgtgtg ccgcgcctcc 120
ttcagcatgt tgttctccct ttttcactat ttcagcaccg acggcacgat gttgatcgct 180
//
The /similarity line now has a 'Trypanosomabrucei' organism, not a 'Trypanosoma
brucei'. Here is the code I used for a simple test case;
import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStreamReader;
import java.io.IOException;
import java.io.FileNotFoundException;
import org.biojava.bio.seq.io.SeqIOTools;
import org.biojava.bio.seq.SequenceIterator;
public class Break {
public static void main(String [] argv){
String origFileName = argv[0];
String newFileName = argv[1];
try
{
BufferedReader origReader = new BufferedReader(new
InputStreamReader(new FileInputStream(origFileName)));
SequenceIterator origSeqIt = SeqIOTools.readEmbl(origReader);
SeqIOTools.writeEmbl(new FileOutputStream(newFileName) , origSeqIt);
}
catch(FileNotFoundException fnfe){
fnfe.printStackTrace();
}
catch(IOException ioe){
ioe.printStackTrace();
}
}
}
Can anyone help?
Paul Mooney.
More information about the biojava-dev
mailing list