[Biojava-l] GFF3 Reader

Craig Berry craig.adrian.berry at gmail.com
Fri Feb 11 10:54:35 UTC 2011


Can I just have someone validate this logic in the GFF3Reader for me
to see if this is a bug or not.

If I have a GFF3 file with the following line:

chrI	SGD	repeat_region	1	62	.	-	.	ID=TEL01L-TR;Name=TEL01L-TR;Note=Terminal%20stretch%20of%20telomeric%20repeats%20on%20the%20left%20arm%20of%20Chromosome%20I;dbxref=SGD:S000028864

When parsing the file then, the class calls Location.fromBio using the
start 1, end 62 and strand –ve.
Since the strand is –ve it needs to convert the positions to negative
values and reverse the start and end. However, as the javadocs
explains:

“In biocoordinates, the start index of a range is represented in
origin 1 (ie the very first index is 1, not 0),  and end= start +
length - 1.”

So before the end is reassigned its value is reduced by 1 and then
negated: e = - ( start – 1) With a start value of 1 as in this case,
the end then becomes 0, such that the range now runs -62 to 0.

This causes a problem when adding this Feature to the feature
collection since it considers a position of value 0 to be on the +ve
strand, such that when Location.plus() is called the check for a
negative location (i.e. both start and end being < 0) returns false
and so you end up trying to create a Location with a –ne start
position but a +ve end, which throws an IllegalArgumentException.

So is there something fishy here or not? I’m assuming that the GFF
content is valid.

Thanks in advance

Craig




More information about the Biojava-l mailing list