[Biojava-dev] GenbankFormat and BASE COUNT

george waldon gwaldon at geneinfinity.org
Wed Sep 6 23:14:28 UTC 2006


>From: mark.schreiber at novartis.com [mailto:mark.schreiber at novartis.com] 
>Are you OK to watch for format changes?

Sorry for the delay in responding. There are effectively a few incoming modifications.

- new naturally occurring amino acid pyrrolysine (Pyl/O - 22nd) will become official on release 156.0, same with EMBL this fall. We'll have to adjust the PROTEIN and PROTEIN_TERM alphabets and maybe have more translation tables. 

- talking about translation tables, I noticed a while ago that the official genbank/EMBL/DDBJ feature table contains 23 genetic code tables whereas Biojava only describes 13. We should probably stick to genbank/EMBL/DDBJ translation tables.

- Xle/J (leucine/isoleucine) will be legal starting Genbank 156.0 (October 2006).

- Feature location syntax X.Y to be discontinued as of October 2006. Record will be changed, although the conversion rule is not given. Maybe it is time to remove this type of fuzziness from Biojava?

Still not taken into account in org.biojavax.bio.seq.io.GenbankFormat:

- SEGMENT keyword, not currently parsed, maybe on purpose. 

- CONTIG keyword, same as above. Example: AE014134, this is an entire chromosome.

I can do the table and alphabet modifications when they become official.
George



More information about the biojava-dev mailing list