[Biojava-l] Parse error with negative strand CDS

Peter Wilkinson pwilkinson at videotron.ca
Fri Sep 12 23:09:38 EDT 2003


So ....

Well ... the rep_origin is supposed to be a single digit, where replication 
starts. 5191..31 is not expected.

for example:



source          1..2245
                 /organism="Escherichia coli"
                 /plasmid="Plasmid Z"
                 /strain="K12"
rep_origin      6
                 /direction=LEFT
                 /note="ori"
CDS             join(complement(567..795)complement(21..349))
                 /gene="trbC"
                 /product="transfer protein C"
CDS             803..1344




AND NOT as you have it. Try to change the annotation within vectorNTI to 
show the exact origin. otherwise it must be in the following format to 
represent a 'fuzzy' annotation:

(5191.31)   #### see below, this seems like a circular reference, I think 
only something like (5191.5231 will work).

The problem is I am not sure how well vectorNTI supports the 'fuzzy' 
annotations. I remember that genomax did not. And I dont think Vector Does 
unless you imported a file with the annotation first. you will have to try it.

Here is a list of formats for location descriptors:

Location                  Description



467                       Points to a single base in the presented sequence



340..565                  Points to a continuous range of bases bounded by and


                           including the starting and ending bases



<345..500                 Indicates that the exact lower boundary point of a


                           feature is unknown.  The location begins at some


                           base previous to the first base specified (which 
need


                           not be contained in the presented sequence) and con-


                           tinues to and includes the ending base



<1..888                   The feature starts before the first sequenced 
base and


                           continues to and includes base 888



(102.110)                 Indicates that the exact location is unknown but 
that


                           it is one of the bases between bases 102 and 
110, in-


                           clusive



(23.45)..600              Specifies that the starting point is one of the 
bases be-


                           tween bases 23 and 45, inclusive, and the end 
point is


                           base 600



(122.133)..(204.221)      The feature starts at a base between 122 and 133, in-


                           clusive, and ends at a base between 204 and 221, in-


                           clusive



123^124                   Points to a site between bases 123 and 124



145^177                   Points to a site between two adjacent bases anywhere


                           between bases 145 and 177



join(12..78,134..202)     Regions 12 to 78 and 134 to 202 should be joined 
to form


                           one contiguous sequence



complement(join(2691..4571,4918..5163))


                           Joins regions 2691 to 4571 and 4918 to 5163, then


                           complements the joined segments (the feature is


                           on the strand complementary to the presented strand)





join(complement(4918..5163),complement(2691..4571))


                           Complements regions 4918 to 5163 and 2691 to 
4571, then


                           joins the complemented segments (the feature is


                           on the strand complementary to the presented strand)





complement(34..(122.126)) Start at one of the bases complementary to those


                           between 122 and 126 on the presented strand and 
finish


                           at the base complementary to base 34 (the 
feature is


                           on the strand complementary to the presented strand)



J00194:100..202           Points to bases 100 to 202, inclusive, in the 
entry (in


                           this database) with primary accession number 
'J00194'





rep_origin      5191..31
                      /vntifkey="33"
                      /label=SV40_ORI
                      /note=" SV40 replication origin core region; 0.67 "
      misc_signal     complement(21..27)
                      /locus_tag="SV40gp1"
                      /note="early mRNA promoter element; 0.66 [66],[78],[79]"
      misc_feature    complement(21..27)
                      /vntifkey="21"
                      /label=E_P
                      /note=" SV40 early mRNA promoter element; 0.66 "
      rep_origin      32..83
                      /vntifkey="33"
                      /label=AUX_ORI
                      /note=" SV40 replication origin auxiliary region; 0.67 "
      repeat_region   complement(40..60)




At 10:39 AM 12/09/2003 -0400, you wrote:
>Hi Peter,
>Thanks for your assistance. I've attached a gb formatted file produced by
>VectorNTI, whose features generate parsing exceptions in BioJava. I believe
>I have isolated at least one problem associated with a feature whose start
>location is greater than its end location as in an origin of replication.
>The gb file produced by VectorNTI has a different format than one downloaded
>from NCBI.
>
>VectorNTI: rep_origin      5191..31
>NCBI:      rep_origin      join(5191..5243,1..31)
>
>If I edit the VectorNTI output to the NCBI format, I do not get any parsing
>errors. I will contact the scientist who supplied my test files to see what
>version of VectorNTI generated them and if there have been any subsequent
>updates.
>Thanks again
>Fred
>
>
>
>-----Original Message-----
>From: Peter Wilkinson [mailto:pwilkinson at videotron.ca]
>Sent: Friday, September 12, 2003 6:39 AM
>To: Schreiber, Mark; Criscuolo, Fred; biojava-l at biojava.org
>Subject: RE: [Biojava-l] Parse error with negative strand CDS
>
>
>Well ... Fred.
>
>Having worked for the company that makes VectorNTI (Informax), I can tell
>you that there are some modifications to the format that should be benign
>... but that is not always true. The output of GenBank files from VectorNTI
>are slightly different, but Iwould not have expected them to break the
>parser.
>
>the annotations within VectorNTI are added into the comments fields, which
>should not be a problem.
>
>One thing to check is whether biojava parser is updated to the current
>genbank format, and whether the output from vectorNTI produces the same
>'version' of the genbank file. If they don't then the biojava parser may
>not handle the vectorNTI output.
>
>if you could send the file I can see what the problem is, the line on its
>own does not seem like a problem. It might be some spacing issues (version
>of the formatting) in the file generated on the previous line somewhere.
>
>Peter
>
>At 10:04 AM 12/09/2003 +1200, Schreiber, Mark wrote:
> >Hi -
> >
> >This looks odd. It may be that the Genbank produced by VectorNTI is not
> >quite right. Can you post the file, or at least the few lines surrounding
> >the CDS feature?
> >
> >- Mark
> >
> >
> >-----Original Message-----
> >From: Criscuolo, Fred [mailto:fred.criscuolo at pfizer.com]
> >Sent: Friday, 12 September 2003 4:27 a.m.
> >To: 'biojava-l at biojava.org'
> >Subject: [Biojava-l] Parse error with negative strand CDS
> >
> >
> >Hi,
> >I am using BioJava 1.3.0 to process a GenBank file that has a CDS on the
> >negative strand. I am using the SeqIOTools.readGenbank(BufferedReader)
> >method to parse the file. I get the following error for any CDS on the
> >negative strand:
> >
> >   This line could not be parsed:  CDS            complement(518..1597)
> >
> >The GenBank files I'm working with are produced by the VectorNTI
> >application. I'm using Java 1.4.2.  I've searched the biojava mailing list
> >but have not seen a reference to this particular problem. Any idea what's
> >wrong? Thanks. Fred
> >
> >Fred Criscuolo
> >Research Informatics - Pfizer, Inc.
> >858.622.7307
> >
> >
> >
> >LEGAL NOTICE
> >Unless expressly stated otherwise, this message is confidential and may be
> >privileged. It is intended for the addressee(s) only. Access to this
> >E-mail by anyone else is unauthorized. If you are not an addressee, any
> >disclosure or copying of the contents of this E-mail or any action taken
> >(or not taken) in reliance on it is unauthorized and may be unlawful. If
> >you are not an addressee, please inform the sender immediately.
> >_______________________________________________
> >Biojava-l mailing list  -  Biojava-l at biojava.org
> >http://biojava.org/mailman/listinfo/biojava-l
> >=======================================================================
> >Attention: The information contained in this message and/or attachments
> >from AgResearch Limited is intended only for the persons or entities
> >to which it is addressed and may contain confidential and/or privileged
> >material. Any review, retransmission, dissemination or other use of, or
> >taking of any action in reliance upon, this information by persons or
> >entities other than the intended recipients is prohibited by AgResearch
> >Limited. If you have received this message in error, please notify the
> >sender immediately.
> >=======================================================================
> >
> >_______________________________________________
> >Biojava-l mailing list  -  Biojava-l at biojava.org
> >http://biojava.org/mailman/listinfo/biojava-l
>
>
>-------------------------------------
>Peter Wilkinson
>Bioinformatics Consultant
>
>-------------------------------------
>
>
>
>
>LEGAL NOTICE
>Unless expressly stated otherwise, this message is confidential and may be 
>privileged. It is intended for the addressee(s) only. Access to this 
>E-mail by anyone else is unauthorized. If you are not an addressee, any 
>disclosure or copying of the contents of this E-mail or any action taken 
>(or not taken) in reliance on it is unauthorized and may be unlawful. If 
>you are not an addressee, please inform the sender immediately.


-------------------------------------
Peter Wilkinson
Bioinformatics Consultant

-------------------------------------  




More information about the Biojava-l mailing list