[Biojava-dev] Fine parsing of genbank files

Richard Holland holland at ebi.ac.uk
Thu Oct 26 07:40:40 UTC 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

That all sounds good to me. It's really helpful having someone check the
code in detail.

cheers,
Richard

george waldon wrote:
> I still have problems with the rich parsing of genbank files. Currently, ordering of features is lost during parsing; e.g. AJ390283, which is an immunoglobulin heavy chain, has its exons and introns in separate groups after parsing and writing out instead of having them nicely ordered as they appear along the sequence in the original record. The problem comes from SimpleRichFeature compareTo and equals methods which do not compare using rank first but at the very last. I propose to give the rank of zero to Feature which are not instance to RichFeature and then to compare using rank first like with the other rich objects. RichFeature will be sorted like in the original genbank record; on the other hand if ranks are not used and are all to 0, then RichFeature and old Feature can me mixed without conflict.
> 
> Secondly, citing Richard in a previous post regarding ranks:
>>> SimpleBioEntryRelationShip suggests that they start at 1 with 0 
>>> reserved for absence of ranking.
>> I tried to start them all from 1, and used 0 for no-rank where rank is compulsory, and null where rank is optional (see below). If you find anywhere where I've been inconsistent, please feel free to raise a Bugzilla bug to point out where I've gone wrong so I can fix them.
> 
> Yes, there are problems in SimpleRichSequenceBuilder:
> - notes start at 0 (SeqPropCount = 0)
> - features start at 0 (featurerank = 0)
> - feature notes start at 0 (featPropCount = 0)
> 
> Finally, the equals method of SimpleBioEntryRelatonship should count a rank equals to zero for a null rank Integer to be consistent with the compareTo method (currently compareTo can return 0 while equals returns false for the same object).
> 
> If it sounds ok to everyone, I can make the changes.
> 
> - George
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFQGZ34C5LeMEKA/QRAlsOAKCncZLNhByWqJ2BGuapcWyGRCpUSwCffb2p
Qvy6wvpXoNj6OG8hcM+jFoA=
=pr2w
-----END PGP SIGNATURE-----



More information about the biojava-dev mailing list