[Biojava-l] SeqIOTools deprecated, looking for alternatives // RichSeq.IOTools
Oliver Stolpe
oliver.stolpe at fu-berlin.de
Thu Nov 12 13:18:52 UTC 2009
Hello *,
the cookbook uses in its examples the SeqIOTools-class for reading the
files. But in the API it is marked as deprecated. Now I am looking for
alternatives, so I searched the list and internet and found out that
biojavax provides methods and classes for reading the files
(RichSequence.IOTools).
For example, I try to read an EMBL-file:
--begin:code--
BufferedReader br = new BufferedReader(new FileReader(filename));
Namespace ns = RichObjectFactory.getDefaultNamespace();
RichSequenceIterator seqs = RichSequence.IOTools.readEMBLDNA(br, ns);
while (seqs.hasNext()) {
RichSequence seq = seqs.nextRichSequence();
System.out.println(seq.getName() + ":" + seq.getAnnotation().asMap());
}
--end:code--
But I always get this error message:
--begin:error--
org.biojava.bio.BioException: Could not read sequence
at
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113)
at ReadGenbankFile.EMBL(ReadGenbankFile.java:42)
at ReadGenbankFile.main(ReadGenbankFile.java:85)
Caused by: org.biojava.bio.seq.io.ParseException:
A Exception Has Occurred During Parsing.
Please submit the details that follow to biojava-l at biojava.org or post a
bug report to http://bugzilla.open-bio.org/
Format_object=org.biojavax.bio.seq.io.EMBLFormat
Accession=null
Id=not set
Comments=
Parse_block=ID AJ243265_2; parent: AJ243265AC AJ243265;FT
CDS join(<1082..1272,2484..2638,4926..>5041)
/codon_start=3
/gene="PGM1"
/product="phosphoglucomutase 1"
/function="carbohydrate metabolism"
/EC_number="5.4.2.2"
/db_xref="GOA:Q9H1D2"
/db_xref="HGNC:8905"
/db_xref="HSSP:3PMG"
/db_xref="InterPro:IPR016055"
/db_xref="UniProtKB/TrEMBL:Q9H1D2"
/protein_id="CAC19809.1"
/translation="VGPYVKKILCEELGAPANSAVNCVPLEDFGGHHPDPNLTYAADLV
ETMKSGEHDFGAAFDGDGDRNMILGKHGFFVNPSDSVAVIAANTFSIPYFQQTGVRGFA
RSMPTSGALDRVASATKIALYETPTGWKFFGNLMDASKLSLCGEESFGT"SQ
Sequence 462 BP;
Stack trace follows ....
at
org.biojavax.bio.seq.io.EMBLFormat.readSection(EMBLFormat.java:775)
at
org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:284)
at
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110)
... 2 more
Caused by: java.lang.StringIndexOutOfBoundsException: String index out
of range: -3
at java.lang.String.substring(String.java:1949)
at java.lang.String.substring(String.java:1916)
at
org.biojavax.bio.seq.io.EMBLFormat.readSection(EMBLFormat.java:761)
... 4 more
--end:error--
The file looks all ok I think and works well with the deprecated SeqIOTools:
--begin:embl-file--
ID AJ243265_2; parent: AJ243265
AC AJ243265;
FT CDS join(<1082..1272,2484..2638,4926..>5041)
FT /codon_start=3
FT /gene="PGM1"
FT /product="phosphoglucomutase 1"
FT /function="carbohydrate metabolism"
FT /EC_number="5.4.2.2"
FT /db_xref="GOA:Q9H1D2"
FT /db_xref="HGNC:8905"
FT /db_xref="HSSP:3PMG"
FT /db_xref="InterPro:IPR016055"
FT /db_xref="UniProtKB/TrEMBL:Q9H1D2"
FT /protein_id="CAC19809.1"
FT
/translation="VGPYVKKILCEELGAPANSAVNCVPLEDFGGHHPDPNLTYAADLV
FT
ETMKSGEHDFGAAFDGDGDRNMILGKHGFFVNPSDSVAVIAANTFSIPYFQQTGVRGFA
FT RSMPTSGALDRVASATKIALYETPTGWKFFGNLMDASKLSLCGEESFGT"
SQ Sequence 462 BP;
ttgtgggacc gtatgtaaag aagatcctct gtgaagaact cggtgcccct
gcgaactcgg 60
cagttaactg cgttcctctg gaggactttg gaggccacca ccctgacccc
aacctcacct 120
atgcagctga cctggtggag accatgaagt caggagagca tgattttggg
gctgcctttg 180
atggagatgg ggatcgaaac atgattctgg gcaagcatgg gttctttgtg
aacccttcag 240
actctgtggc tgtcattgct gccaacacct tcagcattcc gtatttccag
cagactgggg 300
tccgcggttt tgcacggagc atgcccacga gtggtgctct ggaccgggtg
gctagtgcta 360
caaagattgc tttgtatgag accccaactg gctggaagtt ttttgggaat
ttgatggacg 420
cgagcaaact gtccctttgt ggggaggaga gcttcgggac
cg 462
//
--end:embl-file--
The parser always crashes before reading the sequence (ttgt..., directly
after the BP;).
Any suggestions how I get this work?
Or are there other alternatives for substituting the deprecated
SeqIOTools-class?
Thanks in advance,
with best regards,
Oliver
More information about the Biojava-l
mailing list