[Biojava-dev] [Bug 2590] New: I am getting errors when trying to get Sequences from Features (read in from a Genbank file) that have location in the form of " join(location, location, ... location) "

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Thu Sep 18 23:42:08 UTC 2008


http://bugzilla.open-bio.org/show_bug.cgi?id=2590

           Summary: I am getting errors when trying to get Sequences from
                    Features (read in from a Genbank file) that have
                    location in the form of " join(location,location, ...
                    location) "
           Product: BioJava
           Version: live (CVS source)
          Platform: Macintosh
        OS/Version: Mac OS
            Status: NEW
          Severity: normal
          Priority: P2
         Component: symbol
        AssignedTo: biojava-dev at biojava.org
        ReportedBy: tritt at wisc.edu
                CC: tritt at wisc.edu


I get the following Exception when I try writing to Fasta format a sequence
that was read in Genbank format using
org.biojava.bio.seq.io.SeqIOTools.readGenbank(BufferedReader). To write to the
Fasta file, I use org.biojava.bio.seq.io.SeqIOTools.writeFasta(OutputStream ,
Sequence). Features are pulled from the Genbank file and then their sequence is
converted to a SimpleSequence using the constructor
org.biojava.bio.seq.impl.SimpleSequence.SimpleSequence(Feature.getSymbols(),
null, <geneName>, null). 

I have encountered this problem multiple times, and it seems it only happens
when feature locations in Genbank files are in the form "
join(location,location, ... location) ". 



Exception in thread "main" java.lang.ClassCastException:
org.biojava.bio.symbol.AbstractSymbolList$SubList cannot be cast to org
.biojava.bio.symbol.Symbol
        at
org.biojava.bio.symbol.SimpleSymbolList.<init>(SimpleSymbolList.java:144)
        at
org.biojava.bio.seq.projection.ProjectedFeature.getSymbols(ProjectedFeature.java:104)
        at OrthologSeqExtracter.main(OrthologSeqExtracter.java:174)

Below is the class file that exposes this bug. The first argument is ortholog
data in tab-delimited format. The next 8 arguments are 8 genomes in Genbank
format. The 10th and final argument is the output directory. The exception is
by statements between lines 140-155 (note: these are not the same line numbers
given in the printed stack trace from above)

import java.util.HashMap;
import java.util.Iterator;
import java.util.NoSuchElementException;
import java.util.Scanner;
import java.util.regex.Pattern;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.PrintStream;

import org.biojava.bio.BioException;
import org.biojava.bio.seq.Feature;
import org.biojava.bio.seq.SequenceIterator;
import org.biojava.bio.seq.SequenceTools;
import org.biojava.bio.seq.impl.SimpleSequence;
import org.biojava.bio.seq.impl.ViewSequence;
import org.biojava.bio.seq.io.SeqIOTools;

public class OrthologSeqExtracter {

        private static String USAGE = 
                "Usage: OrthologSeqExtracter <orthologs> <MG1655> <EDL_933>
<LT2> <CT18> " +
                                                        "<RIMD> <CFT073>
<UTI89> <EcoHS> <DNA outDir>";
        private static final int MG1655 = 0;
        private static final int EDL933 = 1;
        private static final int LT2 = 2;
        private static final int CT18 = 3;
        private static final int RIMD = 4;
        private static final int CFT073 = 5;
        private static final int UTI89 = 6;
        private static final int HS = 7;
        public static void main(String[] args) {
                String dnaDir = args[args.length-1];
                BufferedReader[] br = new BufferedReader[8];
                FileReader orthologs = null;
                for (int i = 0; i < br.length; i++)
                        br[i] = null;
                try {
                        orthologs = new FileReader(args[0]);
                        for (int i = 0; i < br.length; i++)
                                br[i] = new BufferedReader(new
FileReader(args[i+1]));
                } catch (FileNotFoundException ex){
                        ex.printStackTrace();
                        System.exit(-1);
                }
                SequenceIterator[] seqIt = new SequenceIterator[8];
                HashMap<String,Feature>[] features = new HashMap[8];
                for (int i = 0; i < features.length; i++){
                        features[i] = new HashMap<String,Feature>();
                }
                for (int i = 0; i < br.length; i++)
                        seqIt[i] = SeqIOTools.readGenbank(br[i]);
                for (int i = 0; i < seqIt.length; i++){
                        ViewSequence seq = null;
                        try {
                                seq =
SequenceTools.view(seqIt[i].nextSequence());
                                seqIt[i] = null;
                                br[i] = null;
                        } catch (NoSuchElementException ex) {
                                ex.printStackTrace();
                                System.exit(-1);
                        } catch (BioException ex) {
                                ex.printStackTrace();
                                System.exit(-1);
                        }
                        Iterator<Feature> featIt = seq.features();
                        while (featIt.hasNext()){
                                Feature tmp = featIt.next();
                                if (tmp.getType().equals("CDS")){       
                                        String asapID =
tmp.getAnnotation().getProperty("db_xref").toString();
                                        int index = asapID.indexOf("ASAP:") +
5;
                                        asapID =
asapID.substring(index,index+11);
                                        features[i].put(asapID, tmp);
                                }
                        }
                }
                Scanner in = new Scanner(orthologs);
                in.nextLine();
                int geneNum = 1;
                File dir = new File(dnaDir);
                dir.mkdir();
                while (in.hasNext()){
                        Scanner lineIn = new Scanner(in.nextLine());
                        lineIn.useDelimiter(Pattern.compile("\t"));
                        PrintStream dnaOut = null;
                        String mg1655 = lineIn.next();
                        lineIn.next();
                        lineIn.next();
                        String geneName = lineIn.next();
                        String edl_933 = null;
                        String lt2 = null;
                        String ct18 = null;
                        String rimd = null;
                        String cft073 = null;
                        String uti89 = null;
                        String hs = null;
                        try{
                                if (lineIn.hasNext()) 
                                        edl_933 = lineIn.next();
                                else 
                                        edl_933 = "";
                                if (lineIn.hasNext())
                                        lt2 = lineIn.next();
                                else 
                                        lt2 = "";
                                if (lineIn.hasNext())
                                        ct18 = lineIn.next();
                                else
                                        ct18 = "";
                                if (lineIn.hasNext())
                                        rimd = lineIn.next();
                                else
                                        rimd = "";
                                if (lineIn.hasNext())
                                        cft073 = lineIn.next();
                                else
                                        cft073 = "";
                                if (lineIn.hasNext())
                                        uti89 = lineIn.next();
                                else
                                        uti89 = "";
                                if (lineIn.hasNext())
                                        hs = lineIn.next();
                                else
                                        hs = "";
                        } catch (NoSuchElementException ex){
                                System.err.println("Stopped at " + geneName
                                        + " " + edl_933 + " " + lt2 + " " +
ct18 
                                        + " " + rimd + " " + cft073 + " " +
uti89 + " " + hs);
                                ex.printStackTrace();
                                System.exit(-1);
                        }                       
                        try {
                                File dnaFile = new File(dir,
"gene"+geneNum+"_"+geneName+".na");
                                dnaFile.createNewFile();
                                dnaOut = new PrintStream(dnaFile);
/* The bug occurs here. The Exception is thrown from the calls to
SeqIOTools.writeFasta(OutputStream,Sequence) */
                                if (features[MG1655].containsKey(mg1655))
                                        SeqIOTools.writeFasta(dnaOut, new
SimpleSequence( features[MG1655].get(mg1655).getSymbols() , null,
"MG1655_"+geneName+"_"+mg1655, null) );
                                if (features[EDL933].containsKey(edl_933))
                                        SeqIOTools.writeFasta(dnaOut, new
SimpleSequence( features[EDL933].get(edl_933).getSymbols() , null,
"EDL_933_"+geneName+"_"+edl_933, null));
                                if (features[LT2].containsKey(lt2))
                                        SeqIOTools.writeFasta(dnaOut, new
SimpleSequence( features[LT2].get(lt2).getSymbols() , null,
"LT2_"+geneName+"_"+lt2, null));
                                if (features[CT18].containsKey(ct18))
                                        SeqIOTools.writeFasta(dnaOut, new
SimpleSequence( features[CT18].get(ct18).getSymbols() , null,
"CT18_"+geneName+"_"+ct18, null));
                                if (features[RIMD].containsKey(rimd))
                                        SeqIOTools.writeFasta(dnaOut, new
SimpleSequence( features[RIMD].get(rimd).getSymbols() , null,
"RIMD_"+geneName+"_"+rimd, null));
                                if (features[CFT073].containsKey(cft073))
                                        SeqIOTools.writeFasta(dnaOut, new
SimpleSequence( features[CFT073].get(cft073).getSymbols() , null,
"CFT073_"+geneName+"_"+cft073, null));
                                if (features[UTI89].containsKey(uti89))
                                        SeqIOTools.writeFasta(dnaOut, new
SimpleSequence( features[UTI89].get(uti89).getSymbols() , null,
"UTI89_"+geneName+"_"+uti89, null));
                                if (features[HS].containsKey(hs))
                                        SeqIOTools.writeFasta(dnaOut, new
SimpleSequence( features[HS].get(hs).getSymbols() , null,
"HS_"+geneName+"_"+hs, null));      
                                dnaOut.close(); 
                        } catch (FileNotFoundException ex){
                                System.out.println(geneName);
                                ex.printStackTrace();
                                System.exit(-1);
                        } catch (IOException ex){
                                System.out.println(geneName);
                                ex.printStackTrace();
                                System.exit(-1);
                        }       
                        geneNum++;
                }
        }
}


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the biojava-dev mailing list