[Biojava-l] Ensembl read problems

Keith James kdj@sanger.ac.uk
26 Nov 2002 10:01:48 +0000


>>>>> "Mark" == Schreiber, Mark <mark.schreiber@agresearch.co.nz> writes:

    Mark> Hi - My guess would be that line where it tries to join on a
    Mark> negative location. That doesn't seem to make a whole lot of
    Mark> sense and suggests to me an error in that record.

    Mark> Can any embl experts confirm that?

According to the BNF (excerpts below) this is allowed (location can
contain a signed integer as a coordinate). However, this would then
conflict with the convention of using low/high base bounds (< or >)
and remote locations. The spec does not state which, if any, takes
precedence. That said, I can't recall -ve coords in any entries I've
seen, but I've never explicitly looked for them. My hunch is that they
should be expressed as remote locations (which I think someone
suggested earlier).

location ::= <absolute_location> | <feature_name> |  
<functional_operator>(<location_list>)

absolute_location ::= <local_location> | <path> : <local_location>

local_location ::= <base_position> | <between_position> | <base_range>

base_range ::= <base_position>..<base_position>

base_position ::= <integer> | <low_base_bound> | <high_base_bound> | 
<two_base_bound>

integer ::= <unsigned_integer> | - <unsigned_integer>

unsigned_integer ::= <digit> |  <unsigned_integer><digit>

digit ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

There seems to be a knock-on effect from the first rejected feature
now that the parser error handler allows a broken parse to
continue. Possibly a finally block needs to be added to restore the
location parser's state before the next feature starts.

This is a guess based on eyeballing the code - I haven't found time to
get into this yet. May get there in a few days.

Keith

-- 

- Keith James <kdj@sanger.ac.uk> bioinformatics programming support -
- Pathogen Sequencing Unit, The Wellcome Trust Sanger Institute, UK -