[Biojava-l] An Exception Has Occurred During Parsing ensembl Genbank file Homo_sapiens.0.dat

Richard Holland holland at eaglegenomics.com
Sat Nov 15 16:59:05 UTC 2008


Based on the error stack you previously gave, it seems it is a data problem.

Seeing as you've already got the file downloaded, could you filter out
all the 'dbxref' lines using grep or something similar, then construct
a regex to see if any of them do not match the
'dbxref="something:accession"' pattern?

I suspect you'll find at least one, in which case it is indeed a data
problem and needs to be addressed to the Ensembl helpdesk so they can
correct their dump files.

cheers,
Richard

2008/11/15 pprun <pzgyuanf at gmail.com>:
> It is too large to send it by email, the exactly file name is:
> Homo_sapiens.0.dat
> The exception took place when first X Chr. sequence was encountered.
>
> Hope this can help a bit.
> - Pprun
>
>
> Richard Holland 写道:
>>
>> There are many files on that site. I need to know which specific one
>> you are working with so that I can also attempt to parse it with some
>> debugging options turned on.
>>
>> Could you attach the file you are using to an email if possible?
>>
>> cheers,
>> Richard
>>
>>
>> 2008/11/15 pprun <pzgyuanf at gmail.com>:
>>
>>>
>>> Hi Richard,
>>> Did the original file you mean is the ensembl genbank file?
>>> If so, you can get it from ensembl website
>>> ftp://ftp.ensembl.org/pub/current_genbank/homo_sapiens/
>>>
>>>
>>>
>>> Richard Holland 写道:
>>>
>>> This exception occurs when the Genbank file contains a db_xref entry
>>> that does not follow the format "Type:Accession".
>>>
>>> It's hard to tell if this is the problem here without seeing the original
>>> file.
>>>
>>> cheers,
>>> Richard
>>>
>>> 2008/11/15 pprun <pzgyuanf at gmail.com>:
>>>
>>>
>>> Environments:
>>> -------------
>>> Biojava: 1.6
>>> Java: 1.6.0_10; Java HotSpot(TM) Client VM 11.0-b15
>>> System: Linux version 2.6.24-21-generic running on i386; UTF-8; zh_CN
>>>
>>>
>>> The detail:
>>> --------------
>>> Format_object=org.biojavax.bio.seq.io.GenbankFormat
>>> Accession=chromosome:NCBI36:X:101815144:102815143:1
>>> Id=null
>>> Comments=Bad dbxref
>>> Parse_block=FEATURES   Location/Qualifierssource   1..1000000/organism
>>>  "Homo sapiens"/db_xref   "taxon:9606"gene   complement(5148..5254)/gene
>>>  ENSG00000193147/locus_tag   "AL035427.17"misc_RNA
>>> complement(5148..5254)/gene   "ENSG00000193147"/db_xref
>>> "Clone_based_ensembl_transcri:AL035427.17-201"/db_xref
> ...
>
> [Message clipped]



-- 
Richard Holland, BSc MBCS
Finance Director, Eagle Genomics Ltd
M: +44 7500 438846 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/




More information about the Biojava-l mailing list