[Biojava-l] parsing mRNA GenBank flat file
Matthew Pocock
matthew_pocock@yahoo.co.uk
Sun, 19 May 2002 01:45:24 +0100
Hi Alexander,
There are two issues here. Firstly, the genbank parser was not
grasefully handeling blanks lines (hence the index out of bounds error -
it was trying to index a char not in the line). Seccondly, your genbank
file has some traling text (after the ///) that states an update date,
and this is not part of a valid genbank entry.
I've commited the blank-line fix to cvs. I guess you need to trimm the
spurious lines from your Genbank entries. Did they come from a script?
Matthew
Alexander Churbanov wrote:
> Hi Mtthew,
>
> Thanks for your willing to help me.
> The accession number is AF207834 at NCBI GenBank.
> I am sending you the picture of what I got. Sorry,
> I don't know yet how to create error log files in
> Java.
> I run exactly the file from your demo folder on the
> mRNA flat file.
> Most probably I do something stupid - I do not have
> experience working with your framework.
>
> Thanks,
>
> Alexander
>
> --- Matthew Pocock <matthew_pocock@yahoo.co.uk> wrote:
>
>>Mark: Do RNA Genbank entries use agcu? RNA embl
>>entries use DNA (agct)
>>to serialise the sequence.
>>
>>Alexander: Could you give us an accession number
>>that causes this error
>>as well as the error you get? The complete stack
>>trace is always helpful
>>for fixing things.
>>
>>Matthew
>>
>>Schreiber, Mark wrote:
>>
>>>The problem is probably being caused by the use of
>>
>>the DNA alphabet
>>
>>>instead of the RNA alpahbet. Are you getting
>>
>>IllegalSymbolExceptions?
>>
>>>If this is the case you need to use the RNA
>>
>>alphabet in the GenBank
>>
>>>parser this can be found by calling
>>
>>RNATools.getRNA();
>>
>>>- Mark
>>>
>>>
>>>
>>>>-----Original Message-----
>>>>From: Alexander Churbanov
>>>
>>[mailto:achurbanov@yahoo.com]
>>
>>>>Sent: Thursday, 16 May 2002 1:56 p.m.
>>>>To: biojava-l@biojava.org
>>>>Subject: [Biojava-l] parsing mRNA GenBank flat
>>>
>>file
>>
>>>>
>>>> Hello,
>>>>
>>>> I am trying to parse mRNA file from Gen Bank
>>>
>>using
>>
>>>>your demo program (That parses DNA GenBank flat
>>>
>>file).
>>
>>>>It crashes on the halfway or at the beginning. Do
>>>
>>you
>>
>>>>have any suggestions, other methods, demo programs
>>>
>>or
>>
>>>>other sources showing how to do it.
>>>> Thanks in advance,
>>>>
>>>> Alexander Tchourbanov
>>>>
>>>>__________________________________________________
>>>>Do You Yahoo!?
>>>>LAUNCH - Your Yahoo! Music Experience
>>>>http://launch.yahoo.com
>>>>_______________________________________________
>>>>Biojava-l mailing list - Biojava-l@biojava.org
>>>>http://biojava.org/mailman/listinfo/biojava-l
>>>>
>>>
>>>
> =======================================================================
>
>>>Attention: The information contained in this
>>
>>message and/or attachments
>>
>>>from AgResearch Limited is intended only for the
>>
>>persons or entities
>>
>>>to which it is addressed and may contain
>>
>>confidential and/or privileged
>>
>>>material. Any review, retransmission,
>>
>>dissemination or other use of, or
>>
>>>taking of any action in reliance upon, this
>>
>>information by persons or
>>
>>>entities other than the intended recipients is
>>
>>prohibited by AgResearch
>>
>>>Limited. If you have received this message in
>>
>>error, please notify the
>>
>>>sender immediately.
>>>
>>
> =======================================================================
>
>>>_______________________________________________
>>>Biojava-l mailing list - Biojava-l@biojava.org
>>>http://biojava.org/mailman/listinfo/biojava-l
>>>
>>
>>
>>
>
>
> __________________________________________________
> Do You Yahoo!?
> LAUNCH - Your Yahoo! Music Experience
> http://launch.yahoo.com
>
>
> ------------------------------------------------------------------------
>
>
> ------------------------------------------------------------------------
>
> LOCUS AF207834 600 bp mRNA linear PRI 02-NOV-2001
> DEFINITION Macaca mulatta epididymis-specific protein ESP13.6 mRNA, complete
> cds.
> ACCESSION AF207834
> VERSION AF207834.1 GI:16588332
> KEYWORDS .
> SOURCE rhesus monkey.
> ORGANISM Macaca mulatta
> Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
> Mammalia; Eutheria; Primates; Catarrhini; Cercopithecidae;
> Cercopithecinae; Macaca.
> REFERENCE 1 (bases 1 to 600)
> AUTHORS Liu,Q., Hamil,K.G., Sivashanmugam,P., Grossman,G.,
> Soundararajan,R., Rao,A.J., Richardson,R.T., Zhang,Y.L.,
> O'Rand,M.G., Petrusz,P., French,F.S. and Hall,S.H.
> TITLE Primate epididymis-specific proteins: characterization of ESC42, a
> novel protein containing a trefoil-like motif in monkey and human
> JOURNAL Endocrinology 142 (10), 4529-4539 (2001)
> MEDLINE 21448442
> PUBMED 11564719
> REFERENCE 2 (bases 1 to 600)
> AUTHORS Liu,Q., Hamil,K.G., Johnson,R.T. Jr., Zhang,Y.L., French,F.S. and
> Hall,S.H.
> TITLE Direct Submission
> JOURNAL Submitted (23-NOV-1999) Pediatrics, The Laboratories for
> Reproductive Biology, The University of North Carolina, Room 382,
> MSRB, CB#7500, Chapel Hill, NC 27599-7500, USA
> FEATURES Location/Qualifiers
> source 1..600
> /organism="Macaca mulatta"
> /db_xref="taxon:9544"
> CDS 27..398
> /codon_start=1
> /product="epididymis-specific protein ESP13.6"
> /protein_id="AAL26779.1"
> /db_xref="GI:16588333"
> /translation="MKLLLLALPILVLLPQVIPAYGGEKKCWNRSGHCRKQCKDGEAV
> KETCKNHRACCVPSNEDHRRLPTTSPTPLSDSTPGIIDNILTIRFTTDYFEISSKKDM
> VEESEAGQGTQTSPPNVHHTS"
> BASE COUNT 205 a 151 c 100 g 144 t
> ORIGIN
> 1 ctaccatctc ctgtttccca agcaccatga aactcctgct gttggctctt cctatccttg
> 61 tgctcctacc ccaagtgatc ccagcctatg gtggtgaaaa aaaatgctgg aacagatcag
> 121 ggcactgcag gaaacaatgc aaagatggag aagcagtgaa agaaacatgc aaaaatcatc
> 181 gagcctgctg cgttccatct aatgaagacc acaggcgact tcctacgaca tctcccacac
> 241 ccttgagtga ctcaacacca ggaattattg ataatatttt aacaataagg ttcactacag
> 301 actactttga aataagcagc aagaaagaca tggttgaaga gtctgaggcg ggacagggaa
> 361 ctcagacctc tcccccaaat gttcaccata cctcatgact tcttctcgaa tgtcactcac
> 421 ccctgtcctc agagtgataa actaagtcac atacatatag ataaaacacc acagtgacct
> 481 cccacttccc accaatatgt aattctatta atagaaacag ctgtgtaaag aagtctaaaa
> 541 ttttcactat ttccaatgat aaactcttca gtgctcttct tgaaaaaaaa aaaaaaaaaa
> //
>
>
>
> Revised: October 24, 2001.
>