[Biojava-l] [biojavax] EMBL parser error

Morgane THOMAS-CHOLLIER mthomasc at vub.ac.be
Fri Apr 7 12:18:36 UTC 2006


I now get another error message with the same file :

Exception in thread "main" org.biojava.bio.BioException: Could not read 
sequence
    at 
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111)
    at 
org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34)
Caused by: java.lang.IndexOutOfBoundsException: No group 5
    at java.util.regex.Matcher.group(Matcher.java:355)
    at 
org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:271)
    at 
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
    ... 1 more

Here is the complete file, for info:

ID   DQ158013   standard; genomic DNA; VRT; 118 BP.
XX
AC   DQ158013;
XX
SV   DQ158013.1
XX
DT   19-JAN-2006 (Rel. 86, Created)
DT   19-JAN-2006 (Rel. 86, Last updated, Version 1)
XX
DE   Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds.
XX
KW   .
XX
OS   Triturus helveticus (palmate newt)
OC   Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 
Amphibia;
OC   Batrachia; Caudata; Salamandroidea; Salamandridae; Triturus.
XX
RN   [1]
RP   1-118
RX   DOI; 10.1016/j.ympev.2005.08.012.
RX   PUBMED; 16198128.
RA   Mannaert A., Roelants K., Bossuyt F., Leyns L.;
RT   "A PCR survey for posterior Hox genes in amphibians";
RL   Mol. Phylogenet. Evol. 38(2):449-458(2006).
XX
RN   [2]
RP   1-118
RA   Mannaert A., Roelants K., Bossuyt F., Leyns L.;
RT   ;
RL   Submitted (09-AUG-2005) to the EMBL/GenBank/DDBJ databases.
RL   Biology Department, Vrije Universiteit Brussel, Pleinlaan 2, 
Brussels 1050,
RL   Belgium
XX
FH   Key             Location/Qualifiers
FH
FT   source          1..118
FT                   /organism="Triturus helveticus"
FT                   /mol_type="genomic DNA"
FT                   /clone="Thel.b9"
FT                   /db_xref="taxon:256425"
FT   gene            <1..>118
FT                   /gene="Hoxb9"
FT                   /note="Hoxb-9"
FT   mRNA            <1..>118
FT                   /gene="Hoxb9"
FT                   /product="HOXB9"
FT   CDS             <1..>118
FT                   /codon_start=2
FT                   /gene="Hoxb9"
FT                   /product="HOXB9"
FT                   /db_xref="UniProtKB/TrEMBL:Q2LK47"
FT                   /protein_id="ABA39736.1"
FT                   /translation="KYQTLELEKEFLFNMYLTRDRRHEVARLLNLSERQVKIW"
XX
SQ   Sequence 118 BP; 28 A; 35 C; 37 G; 18 T; 0 other;
     caaataccag acgctggagc tggagaagga gttcctgttc aacatgtacc 
tcacccggga        60
     ccgcaggcac gaggtggccc ggctgctgaa cctcagcgag cgccaggtca 
agatctgg         118
//

Thanks for helping,

Morgane.

Richard Holland wrote:

>That was indeed a bug. I have made a change to the date parsing in
>EMBLFormat and committed it to CVS. Could you test it for me please?
>
>cheers,
>Richard
>
>On Fri, 2006-04-07 at 11:20 +0200, Morgane THOMAS-CHOLLIER wrote:
>  
>
>>Hello,
>>
>>I am currently using biojavax that I checked out today from CVS to parse 
>>an EMBL file, exported from EBI SRS server.
>>
>>I ran into this error :
>>
>>Exception in thread "main" org.biojava.bio.BioException: Could not read 
>>sequence
>>    at 
>>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111)
>>    at 
>>org.embnet.be.biojavax.tryout.EMBLParseTest.main(EMBLParseTest.java:34)
>>Caused by: org.biojava.bio.seq.io.ParseException: Bad date type found: 86
>>    at 
>>org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:278)
>>    at 
>>org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
>>    ... 1 more
>>
>>The EMBL file is :
>>
>>ID   DQ158013   standard; genomic DNA; VRT; 118 BP.
>>XX
>>AC   DQ158013;
>>XX
>>SV   DQ158013.1
>>XX
>>DT   19-JAN-2006 (Rel. 86, Created)
>>DT   19-JAN-2006 (Rel. 86, Last updated, Version 1)
>>XX
>>DE   Triturus helveticus clone Thel.b9 HOXB9 (Hoxb9) gene, partial cds.
>>
>>Removing the two lines that comprise the date information resolves the 
>>problem.
>>
>>Thanks,
>>
>>Morgane.
>>
>>    
>>

-- 
**********************************************************
Morgane THOMAS-CHOLLIER, PHD Student

Vrije Universiteit Brussels (VUB)
Laboratory of Cell Genetics
Pleinlaan 2
1050 Brussels
Belgium




More information about the Biojava-l mailing list