[Biojava-l] parsing mRNA GenBank flat file

Matthew Pocock matthew_pocock@yahoo.co.uk
Sun, 19 May 2002 01:45:24 +0100


Hi Alexander,

There are two issues here. Firstly, the genbank parser was not 
grasefully handeling blanks lines (hence the index out of bounds error - 
it was trying to index a char not in the line). Seccondly, your genbank 
file has some traling text (after the ///) that states an update date, 
and this is not part of a valid genbank entry.

I've commited the blank-line fix to cvs. I guess you need to trimm the 
spurious lines from your Genbank entries. Did they come from a script?

Matthew


Alexander Churbanov wrote:
>    Hi Mtthew,
> 
>    Thanks for your willing to help me.
>    The accession number is AF207834 at NCBI GenBank.
>    I am sending you the picture of what I got. Sorry,
> I don't know yet how to create error log files in
> Java.
>    I run exactly the file from your demo folder on the
> mRNA flat file.
>    Most probably I do something stupid - I do not have
> experience working with your framework.
> 
>    Thanks,
> 
>    Alexander
>    
> --- Matthew Pocock <matthew_pocock@yahoo.co.uk> wrote:
> 
>>Mark: Do RNA Genbank entries use agcu? RNA embl
>>entries use DNA (agct) 
>>to serialise the sequence.
>>
>>Alexander: Could you give us an accession number
>>that causes this error 
>>as well as the error you get? The complete stack
>>trace is always helpful 
>>for fixing things.
>>
>>Matthew
>>
>>Schreiber, Mark wrote:
>>
>>>The problem is probably being caused by the use of
>>
>>the DNA alphabet
>>
>>>instead of the RNA alpahbet. Are you getting
>>
>>IllegalSymbolExceptions?
>>
>>>If this is the case you need to use the RNA
>>
>>alphabet in the GenBank
>>
>>>parser this can be found by calling
>>
>>RNATools.getRNA();
>>
>>>- Mark
>>>
>>>
>>>
>>>>-----Original Message-----
>>>>From: Alexander Churbanov
>>>
>>[mailto:achurbanov@yahoo.com] 
>>
>>>>Sent: Thursday, 16 May 2002 1:56 p.m.
>>>>To: biojava-l@biojava.org
>>>>Subject: [Biojava-l] parsing mRNA GenBank flat
>>>
>>file
>>
>>>>
>>>>  Hello,
>>>>
>>>>  I am trying to parse mRNA file from Gen Bank
>>>
>>using
>>
>>>>your demo program (That parses DNA GenBank flat
>>>
>>file).
>>
>>>>It crashes on the halfway or at the beginning. Do
>>>
>>you
>>
>>>>have any suggestions, other methods, demo programs
>>>
>>or
>>
>>>>other sources showing how to do it.
>>>>  Thanks in advance,
>>>>
>>>>  Alexander Tchourbanov
>>>>
>>>>__________________________________________________
>>>>Do You Yahoo!?
>>>>LAUNCH - Your Yahoo! Music Experience
>>>>http://launch.yahoo.com 
>>>>_______________________________________________
>>>>Biojava-l mailing list  -  Biojava-l@biojava.org 
>>>>http://biojava.org/mailman/listinfo/biojava-l
>>>>
>>>
>>>
> =======================================================================
> 
>>>Attention: The information contained in this
>>
>>message and/or attachments
>>
>>>from AgResearch Limited is intended only for the
>>
>>persons or entities
>>
>>>to which it is addressed and may contain
>>
>>confidential and/or privileged
>>
>>>material. Any review, retransmission,
>>
>>dissemination or other use of, or
>>
>>>taking of any action in reliance upon, this
>>
>>information by persons or
>>
>>>entities other than the intended recipients is
>>
>>prohibited by AgResearch
>>
>>>Limited. If you have received this message in
>>
>>error, please notify the
>>
>>>sender immediately.
>>>
>>
> =======================================================================
> 
>>>_______________________________________________
>>>Biojava-l mailing list  -  Biojava-l@biojava.org
>>>http://biojava.org/mailman/listinfo/biojava-l
>>>
>>
>>
>>
> 
> 
> __________________________________________________
> Do You Yahoo!?
> LAUNCH - Your Yahoo! Music Experience
> http://launch.yahoo.com
> 
> 
> ------------------------------------------------------------------------
> 
> 
> ------------------------------------------------------------------------
> 
> LOCUS       AF207834                 600 bp    mRNA    linear   PRI 02-NOV-2001
> DEFINITION  Macaca mulatta epididymis-specific protein ESP13.6 mRNA, complete
>             cds.
> ACCESSION   AF207834
> VERSION     AF207834.1  GI:16588332
> KEYWORDS    .
> SOURCE      rhesus monkey.
>   ORGANISM  Macaca mulatta
>             Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
>             Mammalia; Eutheria; Primates; Catarrhini; Cercopithecidae;
>             Cercopithecinae; Macaca.
> REFERENCE   1  (bases 1 to 600)
>   AUTHORS   Liu,Q., Hamil,K.G., Sivashanmugam,P., Grossman,G.,
>             Soundararajan,R., Rao,A.J., Richardson,R.T., Zhang,Y.L.,
>             O'Rand,M.G., Petrusz,P., French,F.S. and Hall,S.H.
>   TITLE     Primate epididymis-specific proteins: characterization of ESC42, a
>             novel protein containing a trefoil-like motif in monkey and human
>   JOURNAL   Endocrinology 142 (10), 4529-4539 (2001)
>   MEDLINE   21448442
>    PUBMED   11564719
> REFERENCE   2  (bases 1 to 600)
>   AUTHORS   Liu,Q., Hamil,K.G., Johnson,R.T. Jr., Zhang,Y.L., French,F.S. and
>             Hall,S.H.
>   TITLE     Direct Submission
>   JOURNAL   Submitted (23-NOV-1999) Pediatrics, The Laboratories for
>             Reproductive Biology, The University of North Carolina, Room 382,
>             MSRB, CB#7500, Chapel Hill, NC 27599-7500, USA
> FEATURES             Location/Qualifiers
>      source          1..600
>                      /organism="Macaca mulatta"
>                      /db_xref="taxon:9544"
>      CDS             27..398
>                      /codon_start=1
>                      /product="epididymis-specific protein ESP13.6"
>                      /protein_id="AAL26779.1"
>                      /db_xref="GI:16588333"
>                      /translation="MKLLLLALPILVLLPQVIPAYGGEKKCWNRSGHCRKQCKDGEAV
>                      KETCKNHRACCVPSNEDHRRLPTTSPTPLSDSTPGIIDNILTIRFTTDYFEISSKKDM
>                      VEESEAGQGTQTSPPNVHHTS"
> BASE COUNT      205 a    151 c    100 g    144 t
> ORIGIN      
>         1 ctaccatctc ctgtttccca agcaccatga aactcctgct gttggctctt cctatccttg
>        61 tgctcctacc ccaagtgatc ccagcctatg gtggtgaaaa aaaatgctgg aacagatcag
>       121 ggcactgcag gaaacaatgc aaagatggag aagcagtgaa agaaacatgc aaaaatcatc
>       181 gagcctgctg cgttccatct aatgaagacc acaggcgact tcctacgaca tctcccacac
>       241 ccttgagtga ctcaacacca ggaattattg ataatatttt aacaataagg ttcactacag
>       301 actactttga aataagcagc aagaaagaca tggttgaaga gtctgaggcg ggacagggaa
>       361 ctcagacctc tcccccaaat gttcaccata cctcatgact tcttctcgaa tgtcactcac
>       421 ccctgtcctc agagtgataa actaagtcac atacatatag ataaaacacc acagtgacct
>       481 cccacttccc accaatatgt aattctatta atagaaacag ctgtgtaaag aagtctaaaa
>       541 ttttcactat ttccaatgat aaactcttca gtgctcttct tgaaaaaaaa aaaaaaaaaa
> //
> 
> 
> 
> Revised: October 24, 2001.
>