[Biojava-dev] Fine parsing of genbank files

george waldon gwaldon at geneinfinity.org
Thu Oct 26 00:41:24 UTC 2006


I still have problems with the rich parsing of genbank files. Currently, ordering of features is lost during parsing; e.g. AJ390283, which is an immunoglobulin heavy chain, has its exons and introns in separate groups after parsing and writing out instead of having them nicely ordered as they appear along the sequence in the original record. The problem comes from SimpleRichFeature compareTo and equals methods which do not compare using rank first but at the very last. I propose to give the rank of zero to Feature which are not instance to RichFeature and then to compare using rank first like with the other rich objects. RichFeature will be sorted like in the original genbank record; on the other hand if ranks are not used and are all to 0, then RichFeature and old Feature can me mixed without conflict.

Secondly, citing Richard in a previous post regarding ranks:
>> SimpleBioEntryRelationShip suggests that they start at 1 with 0 
>> reserved for absence of ranking.
>I tried to start them all from 1, and used 0 for no-rank where rank is compulsory, and null where rank is optional (see below). If you find anywhere where I've been inconsistent, please feel free to raise a Bugzilla bug to point out where I've gone wrong so I can fix them.

Yes, there are problems in SimpleRichSequenceBuilder:
- notes start at 0 (SeqPropCount = 0)
- features start at 0 (featurerank = 0)
- feature notes start at 0 (featPropCount = 0)

Finally, the equals method of SimpleBioEntryRelatonship should count a rank equals to zero for a null rank Integer to be consistent with the compareTo method (currently compareTo can return 0 while equals returns false for the same object).

If it sounds ok to everyone, I can make the changes.

- George



More information about the biojava-dev mailing list