[Biojava-l] SeqIOTools deprecated, looking for alternatives // RichSeq.IOTools

Oliver Stolpe oliver.stolpe at fu-berlin.de
Thu Nov 12 13:18:52 UTC 2009


Hello *,

the cookbook uses in its examples the SeqIOTools-class for reading the 
files. But in the API it is marked as deprecated. Now I am looking for 
alternatives, so I searched the list and internet and found out that 
biojavax provides methods and classes for reading the files 
(RichSequence.IOTools).

For example, I try to read an EMBL-file:

--begin:code--

BufferedReader br = new BufferedReader(new FileReader(filename));
Namespace ns = RichObjectFactory.getDefaultNamespace();
RichSequenceIterator seqs = RichSequence.IOTools.readEMBLDNA(br, ns);

while (seqs.hasNext()) {
    RichSequence seq = seqs.nextRichSequence();
    System.out.println(seq.getName() + ":" + seq.getAnnotation().asMap());
}

--end:code--

But I always get this error message:

--begin:error--

org.biojava.bio.BioException: Could not read sequence
        at 
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113)
        at ReadGenbankFile.EMBL(ReadGenbankFile.java:42)
        at ReadGenbankFile.main(ReadGenbankFile.java:85)
Caused by: org.biojava.bio.seq.io.ParseException:

A Exception Has Occurred During Parsing.
Please submit the details that follow to biojava-l at biojava.org or post a 
bug report to http://bugzilla.open-bio.org/

Format_object=org.biojavax.bio.seq.io.EMBLFormat
Accession=null
Id=not set
Comments=
Parse_block=ID   AJ243265_2; parent: AJ243265AC   AJ243265;FT   
CDS             join(<1082..1272,2484..2638,4926..>5041)
                /codon_start=3
                /gene="PGM1"
                /product="phosphoglucomutase 1"
                /function="carbohydrate metabolism"
                /EC_number="5.4.2.2"
                /db_xref="GOA:Q9H1D2"
                /db_xref="HGNC:8905"
                /db_xref="HSSP:3PMG"
                /db_xref="InterPro:IPR016055"
                /db_xref="UniProtKB/TrEMBL:Q9H1D2"
                /protein_id="CAC19809.1"
                /translation="VGPYVKKILCEELGAPANSAVNCVPLEDFGGHHPDPNLTYAADLV
                ETMKSGEHDFGAAFDGDGDRNMILGKHGFFVNPSDSVAVIAANTFSIPYFQQTGVRGFA
                RSMPTSGALDRVASATKIALYETPTGWKFFGNLMDASKLSLCGEESFGT"SQ   
Sequence   462 BP;
Stack trace follows ....


        at 
org.biojavax.bio.seq.io.EMBLFormat.readSection(EMBLFormat.java:775)
        at 
org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:284)
        at 
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110)
        ... 2 more
Caused by: java.lang.StringIndexOutOfBoundsException: String index out 
of range: -3
        at java.lang.String.substring(String.java:1949)
        at java.lang.String.substring(String.java:1916)
        at 
org.biojavax.bio.seq.io.EMBLFormat.readSection(EMBLFormat.java:761)
        ... 4 more

--end:error--

The file looks all ok I think and works well with the deprecated SeqIOTools:

--begin:embl-file--
ID   AJ243265_2; parent: AJ243265
AC   AJ243265;
FT   CDS             join(<1082..1272,2484..2638,4926..>5041)
FT                   /codon_start=3
FT                   /gene="PGM1"
FT                   /product="phosphoglucomutase 1"
FT                   /function="carbohydrate metabolism"
FT                   /EC_number="5.4.2.2"
FT                   /db_xref="GOA:Q9H1D2"
FT                   /db_xref="HGNC:8905"
FT                   /db_xref="HSSP:3PMG"
FT                   /db_xref="InterPro:IPR016055"
FT                   /db_xref="UniProtKB/TrEMBL:Q9H1D2"
FT                   /protein_id="CAC19809.1"
FT                   
/translation="VGPYVKKILCEELGAPANSAVNCVPLEDFGGHHPDPNLTYAADLV
FT                   
ETMKSGEHDFGAAFDGDGDRNMILGKHGFFVNPSDSVAVIAANTFSIPYFQQTGVRGFA
FT                   RSMPTSGALDRVASATKIALYETPTGWKFFGNLMDASKLSLCGEESFGT"
SQ   Sequence   462 BP;
     ttgtgggacc gtatgtaaag aagatcctct gtgaagaact cggtgcccct 
gcgaactcgg        60
     cagttaactg cgttcctctg gaggactttg gaggccacca ccctgacccc 
aacctcacct       120
     atgcagctga cctggtggag accatgaagt caggagagca tgattttggg 
gctgcctttg       180
     atggagatgg ggatcgaaac atgattctgg gcaagcatgg gttctttgtg 
aacccttcag       240
     actctgtggc tgtcattgct gccaacacct tcagcattcc gtatttccag 
cagactgggg       300
     tccgcggttt tgcacggagc atgcccacga gtggtgctct ggaccgggtg 
gctagtgcta       360
     caaagattgc tttgtatgag accccaactg gctggaagtt ttttgggaat 
ttgatggacg       420
     cgagcaaact gtccctttgt ggggaggaga gcttcgggac 
cg                          462
//
--end:embl-file--

The parser always crashes before reading the sequence (ttgt..., directly 
after the BP;).

Any suggestions how I get this work?
Or are there other alternatives for substituting the deprecated 
SeqIOTools-class?

Thanks in advance,

with best regards,

Oliver



More information about the Biojava-l mailing list