[Biojava-dev] [Bug 2590] New: I am getting errors when trying to get Sequences from Features (read in from a Genbank file) that have location in the form of " join(location, location, ... location) "
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Thu Sep 18 23:42:08 UTC 2008
http://bugzilla.open-bio.org/show_bug.cgi?id=2590
Summary: I am getting errors when trying to get Sequences from
Features (read in from a Genbank file) that have
location in the form of " join(location,location, ...
location) "
Product: BioJava
Version: live (CVS source)
Platform: Macintosh
OS/Version: Mac OS
Status: NEW
Severity: normal
Priority: P2
Component: symbol
AssignedTo: biojava-dev at biojava.org
ReportedBy: tritt at wisc.edu
CC: tritt at wisc.edu
I get the following Exception when I try writing to Fasta format a sequence
that was read in Genbank format using
org.biojava.bio.seq.io.SeqIOTools.readGenbank(BufferedReader). To write to the
Fasta file, I use org.biojava.bio.seq.io.SeqIOTools.writeFasta(OutputStream ,
Sequence). Features are pulled from the Genbank file and then their sequence is
converted to a SimpleSequence using the constructor
org.biojava.bio.seq.impl.SimpleSequence.SimpleSequence(Feature.getSymbols(),
null, <geneName>, null).
I have encountered this problem multiple times, and it seems it only happens
when feature locations in Genbank files are in the form "
join(location,location, ... location) ".
Exception in thread "main" java.lang.ClassCastException:
org.biojava.bio.symbol.AbstractSymbolList$SubList cannot be cast to org
.biojava.bio.symbol.Symbol
at
org.biojava.bio.symbol.SimpleSymbolList.<init>(SimpleSymbolList.java:144)
at
org.biojava.bio.seq.projection.ProjectedFeature.getSymbols(ProjectedFeature.java:104)
at OrthologSeqExtracter.main(OrthologSeqExtracter.java:174)
Below is the class file that exposes this bug. The first argument is ortholog
data in tab-delimited format. The next 8 arguments are 8 genomes in Genbank
format. The 10th and final argument is the output directory. The exception is
by statements between lines 140-155 (note: these are not the same line numbers
given in the printed stack trace from above)
import java.util.HashMap;
import java.util.Iterator;
import java.util.NoSuchElementException;
import java.util.Scanner;
import java.util.regex.Pattern;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.PrintStream;
import org.biojava.bio.BioException;
import org.biojava.bio.seq.Feature;
import org.biojava.bio.seq.SequenceIterator;
import org.biojava.bio.seq.SequenceTools;
import org.biojava.bio.seq.impl.SimpleSequence;
import org.biojava.bio.seq.impl.ViewSequence;
import org.biojava.bio.seq.io.SeqIOTools;
public class OrthologSeqExtracter {
private static String USAGE =
"Usage: OrthologSeqExtracter <orthologs> <MG1655> <EDL_933>
<LT2> <CT18> " +
"<RIMD> <CFT073>
<UTI89> <EcoHS> <DNA outDir>";
private static final int MG1655 = 0;
private static final int EDL933 = 1;
private static final int LT2 = 2;
private static final int CT18 = 3;
private static final int RIMD = 4;
private static final int CFT073 = 5;
private static final int UTI89 = 6;
private static final int HS = 7;
public static void main(String[] args) {
String dnaDir = args[args.length-1];
BufferedReader[] br = new BufferedReader[8];
FileReader orthologs = null;
for (int i = 0; i < br.length; i++)
br[i] = null;
try {
orthologs = new FileReader(args[0]);
for (int i = 0; i < br.length; i++)
br[i] = new BufferedReader(new
FileReader(args[i+1]));
} catch (FileNotFoundException ex){
ex.printStackTrace();
System.exit(-1);
}
SequenceIterator[] seqIt = new SequenceIterator[8];
HashMap<String,Feature>[] features = new HashMap[8];
for (int i = 0; i < features.length; i++){
features[i] = new HashMap<String,Feature>();
}
for (int i = 0; i < br.length; i++)
seqIt[i] = SeqIOTools.readGenbank(br[i]);
for (int i = 0; i < seqIt.length; i++){
ViewSequence seq = null;
try {
seq =
SequenceTools.view(seqIt[i].nextSequence());
seqIt[i] = null;
br[i] = null;
} catch (NoSuchElementException ex) {
ex.printStackTrace();
System.exit(-1);
} catch (BioException ex) {
ex.printStackTrace();
System.exit(-1);
}
Iterator<Feature> featIt = seq.features();
while (featIt.hasNext()){
Feature tmp = featIt.next();
if (tmp.getType().equals("CDS")){
String asapID =
tmp.getAnnotation().getProperty("db_xref").toString();
int index = asapID.indexOf("ASAP:") +
5;
asapID =
asapID.substring(index,index+11);
features[i].put(asapID, tmp);
}
}
}
Scanner in = new Scanner(orthologs);
in.nextLine();
int geneNum = 1;
File dir = new File(dnaDir);
dir.mkdir();
while (in.hasNext()){
Scanner lineIn = new Scanner(in.nextLine());
lineIn.useDelimiter(Pattern.compile("\t"));
PrintStream dnaOut = null;
String mg1655 = lineIn.next();
lineIn.next();
lineIn.next();
String geneName = lineIn.next();
String edl_933 = null;
String lt2 = null;
String ct18 = null;
String rimd = null;
String cft073 = null;
String uti89 = null;
String hs = null;
try{
if (lineIn.hasNext())
edl_933 = lineIn.next();
else
edl_933 = "";
if (lineIn.hasNext())
lt2 = lineIn.next();
else
lt2 = "";
if (lineIn.hasNext())
ct18 = lineIn.next();
else
ct18 = "";
if (lineIn.hasNext())
rimd = lineIn.next();
else
rimd = "";
if (lineIn.hasNext())
cft073 = lineIn.next();
else
cft073 = "";
if (lineIn.hasNext())
uti89 = lineIn.next();
else
uti89 = "";
if (lineIn.hasNext())
hs = lineIn.next();
else
hs = "";
} catch (NoSuchElementException ex){
System.err.println("Stopped at " + geneName
+ " " + edl_933 + " " + lt2 + " " +
ct18
+ " " + rimd + " " + cft073 + " " +
uti89 + " " + hs);
ex.printStackTrace();
System.exit(-1);
}
try {
File dnaFile = new File(dir,
"gene"+geneNum+"_"+geneName+".na");
dnaFile.createNewFile();
dnaOut = new PrintStream(dnaFile);
/* The bug occurs here. The Exception is thrown from the calls to
SeqIOTools.writeFasta(OutputStream,Sequence) */
if (features[MG1655].containsKey(mg1655))
SeqIOTools.writeFasta(dnaOut, new
SimpleSequence( features[MG1655].get(mg1655).getSymbols() , null,
"MG1655_"+geneName+"_"+mg1655, null) );
if (features[EDL933].containsKey(edl_933))
SeqIOTools.writeFasta(dnaOut, new
SimpleSequence( features[EDL933].get(edl_933).getSymbols() , null,
"EDL_933_"+geneName+"_"+edl_933, null));
if (features[LT2].containsKey(lt2))
SeqIOTools.writeFasta(dnaOut, new
SimpleSequence( features[LT2].get(lt2).getSymbols() , null,
"LT2_"+geneName+"_"+lt2, null));
if (features[CT18].containsKey(ct18))
SeqIOTools.writeFasta(dnaOut, new
SimpleSequence( features[CT18].get(ct18).getSymbols() , null,
"CT18_"+geneName+"_"+ct18, null));
if (features[RIMD].containsKey(rimd))
SeqIOTools.writeFasta(dnaOut, new
SimpleSequence( features[RIMD].get(rimd).getSymbols() , null,
"RIMD_"+geneName+"_"+rimd, null));
if (features[CFT073].containsKey(cft073))
SeqIOTools.writeFasta(dnaOut, new
SimpleSequence( features[CFT073].get(cft073).getSymbols() , null,
"CFT073_"+geneName+"_"+cft073, null));
if (features[UTI89].containsKey(uti89))
SeqIOTools.writeFasta(dnaOut, new
SimpleSequence( features[UTI89].get(uti89).getSymbols() , null,
"UTI89_"+geneName+"_"+uti89, null));
if (features[HS].containsKey(hs))
SeqIOTools.writeFasta(dnaOut, new
SimpleSequence( features[HS].get(hs).getSymbols() , null,
"HS_"+geneName+"_"+hs, null));
dnaOut.close();
} catch (FileNotFoundException ex){
System.out.println(geneName);
ex.printStackTrace();
System.exit(-1);
} catch (IOException ex){
System.out.println(geneName);
ex.printStackTrace();
System.exit(-1);
}
geneNum++;
}
}
}
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the biojava-dev
mailing list