[Biojava-l] BioJava 1.1x freeze plans

Cox, Greg gcox@netgenics.com
Wed, 31 Jan 2001 09:53:20 -0500


This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

------_=_NextPart_000_01C08B95.8D35B8B0
Content-Type: text/plain;
	charset="iso-8859-1"

Thanks for the pointers Thomas, I was able to finish up the parser this
morning.  I don't have read-write access yet, so I've attached the files.
GenbankFormat and GenbankProcessor constitute the parser, and the other two
are the demo/test files.  

	I didn't tackle the source issue in FeatureTableParser, so the gff
file GenbankToGffFasta generates has embl as the source for all the
features.  My preference would be to set the source field in the constructor
and have a setFeatureSource(String) method so that a listener could be
reused with different file formats.  I'm not sure there's a good use case
for that though so I'd like an opinion from someone who knows the field
(unlike me).  

Greg

-----Original Message-----
From: Thomas Down [mailto:td2@sanger.ac.uk]
Sent: Tuesday, January 30, 2001 3:48 PM
To: Cox, Greg
Cc: biojava-l@biojava.org
Subject: Re: [Biojava-l] BioJava 1.1x freeze plans


On Tue, Jan 30, 2001 at 12:33:45PM -0500, Cox, Greg wrote:
> Thomas,
> 	I'm in the midst of revising GenbankFormat to handle the new IO
> style, and I'd like to see that in the 1.1 release.  Right now I'm having
> trouble with features, and hope someone can help.  FeatureTableParser's
> documentation says it is shared between EMBL and GENBANK format, but EMBL
is
> hard coded as the type source.  Is there any existing GENBANK feature
> information?  I'm leary about changing FeatureTableParser myself since I
> don't know what will break.

Great -- I think a lot of people will be happy to see
that working again...  There shouldn't be any problem
taking this change before 1.1, so long as there's a working
demo program to check that it's all working as expected.

Yes, the FeatureTableParser was shared between Embl and
Genbank in BioJava 1.0.  Unless you know of some subtle
differences between Embl and Genbank feature tables (I've
always thought they were the same, but that said I've very
rarely worked with Genbank files myself), it ought to be
possible to do a new GenbankParser which also uses
FeatureTableParser.

I presume you've seen the lifecycle:

  - create a FeatureTableParser, pointing it at the appropriate
    SeqIOListener.

  - When you see the start of a feature, call startFeature
    passing in  the feature type.

  - Pass each line of feature table, with the start trimmed
    off, to featureData

  - Flush the feature with endFeature().

  - You should then get a feature notified to the SeqIOListener.

The code isn't as elegant as it could be, mainly because I made
the minimum set of changes necessary to make it all work int
the `new IO' world.

As to the `source' issue, that being hard-coded is a mistake.
Feel free to offer a way to change this, either by using a
setFeatureSource(String) method, or adding an extra parameter
to the constructor.  Right now, FeatureTableParsers are only
constructed by  EmblProcessor -- so long as you keep that in
sync with any changes you make.

If you do change anything else, the gff.EmblToGFFFasta demo
is a good test to make sure everything is still working.  You
might want to hack up an equivalent GenbankToGFFFasta to
test your work.

Thanks,

    Thomas.


------_=_NextPart_000_01C08B95.8D35B8B0
Content-Type: application/octet-stream;
	name="GenbankParser.zip"
Content-Transfer-Encoding: base64
Content-Disposition: attachment;
	filename="GenbankParser.zip"

UEsDBBQAAAAIAFZMPyoXekjazgQAAJ4LAAAWAAAAR2VuYmFua1RvR2ZmRmFzdGEuamF2YZVWbW/b
NhD+7AD5D7d8iR148rp9y0uxpKuDDFlqNC6GoegHSjpJbCRRJSk7xtD/vjuSkqUmaTEjgWTe23N3
zx29ODk8gBN45nMl1Z9iIyDFDZaqqbC2kKgUWd/ZrAtp3AlUYgcxQqYRyx2k0lgt49ZiCqJOoVKp
zCR9aesUNdjCuQCLujKgMj6A67sPcIvGkPwaa9SihFUblzKBW5lgnWAEPp4pVFumzgFFHIbaSls4
XwyJ1G8y2KkWUgW1slCIDYIgWbObO2uDeNqn4j6Ftc3pYrHdbqO8biOl8wWrl5jZRemwRYWtyt7o
DQm1zAsLmeK0umrQs8Ayhc9K1pbqEe8cLFmnciPTVjgPIFpbKG1cXmgwJMZJlZQU5SNr+N0rUQ4J
ua64BSbq4y9JUinNnglAJaxUNdAfB+ua12j1GRPr+iCtASEr4/NnyMphdAaxVJ/J4OeSmilLWecO
xhw20kjrNApVITQi990T9uXqBV+ugp0SvTQieSBzyLPs7PDg8EBWjdIWnGprZRmdnI0PpXJH/SH5
633HQfiCyOyqWJXf1cAvPxL/IARrVM13g1D5cy2qiFIOqSxOOvLUG9TcESZ8LOoHyGTJvaSuCMiE
scKfeHIRQfBLy5PgeklWy2Uv9/NEOhkK22r0JIHz1ev9oAYoNM6VqmlqhEXiqdoyCzQKx7cxlnkn
3mppsQPlPLrAjGOkQZDGgT8YPD2Ptf9yzsPxmkvTRVmr6yxbukxzf7J0CfHJu9YyUehxvnCGI8dX
wtCEFCg2kgZMZRnvkbdVXA5c0tj9JSxVZQsrlajkgXeCha3gWqC1O6japACfUF6qmHaOxqYUVGLy
dozk7thvleMA73g/fN1oXmvMqZmPHcf90kpKYczTNA8P/j08mAQd+m7psVEy5Zmrp/e0ymjuPn4C
oXMzo4ZqtTXw9jHBhmebTNl8IrMpK0Ql1jmB++kCfpvxuRNOnBXUlHVvOD3iRvy/qh/NztjdV+Ys
OdW7fYTFgvA+oCccmRpAoakNxIMcLWAXlrgdK3JobNtQjo4ztajQOC8u6BDAhQPNry69j7988hAm
e3jP6r0a6RH6Z7V+dVodfiY817qjO0HMMie8D1O2dBsV8vD07oK6P5uGsFdkihrT9+STrq88PL3F
WDh1BhMW3NRNa6njKKog6/AOJNNBfWazEK9DeNXKkuyWIrFK78DwyxjnSquE7i2lo6A0veeFhd94
iJaXb9bv3v8T/F+WTSFiaqTgF/L4x93lWqnSRNRdeu/yvncrdiU0X9o6PC+8Fav6k+mRVQ9YHw2r
75fbizXPRjV3TB1XnOjZl2jP233Hh/JpT5whAmLJID7trb95gWm/x7re9ceDtq1oRG045e/DUIPj
JygCLWdPumguDW9yWu7hzccei6dD7LdKNaDo8qCxo/3VXwvhIs8HGzzqbBy08S3yzTXzVLO7S7wm
X2F7xQ7eDSUsiFns9gYu9nUaMTuMxLybpnlHl7lnbZ/dtuB5ZV9RIcwdPlLL3WoLm6ePy/GoUk6z
JrXuvGPIJHAocg3tpaQ/h76NQ4JQs7rOTPpe8PXNAzS29xwJyl/DkpxMEmGpG9M1r18R0wayw6VM
rlxE+gm01mIP9H5Hv/WqCB+lnb7qVy7/u837H1BLAwQUAAAACACzXD4qAKroJf0BAABFBAAAEAAA
AFRlc3RHZW5iYW5rLmphdmGFVMGO2kAMPQeJf5hySroo6l6LOLBtU3Ghq4U9VKs9OMEkU5KZMOMs
oKr/Xk8yKQGprQTx4Gf7PY8dasj2kGNg8TAbj8YjWdXakPgBbxBLHb8fOrXJ41TqFks9+BfInqtU
l/+MwMP/4D/8dZOWMhNZCdaKDVr6iioFtRc/x6PAg5aA2LxpuRUVSBWuyUiVi5dXASa3EUe66IDM
2Zn2HMhd6MC4RJVTId7Ng/uoBTo4oMLoo1B4FF9OGdYktQonzxY/XqnIO5vIEifRrM385WSzdb4h
LuZtNXdsmV8+vPqMNR4aVBkm2lRAIve2i/dMnS/0GQ/NbocGt08IWzQi97bLuAbDrp2AkWCp6ob4
chAqD/aKBkg40BxFNxIfGllyXgIZaXMWNnUnMb9w9Hofjc7QWm1iHxuuedwl3hSKk8Wnzben755m
UdYFpEgC3IH7+bxabLQubZwj8bnvf93u2CMYy10bb+ddlgvtPOGE9B7V5KaHJaEBliR40ZZD7cHV
1fhLnfbzmPZE06BrO5r5SR8LN1NXLS7ArvDEc7rapZ7ZMbLMNlJxWO/v2+K+LGEV64bimleYSuXK
uo5WUHGYuBMTwRz8vHNl4kw3ihIEagxaj+/8z8FC+kcGlBUi3LjVhpRXkqLLC+Ep18R/CxsDF1Ve
FJ4khfedz1XjL39+A1BLAwQUAAAACAAySj8qqDaseYYEAACBCwAAFQAAAEdlbmJhbmtQcm9jZXNz
b3IuamF2YYVWTVPjRhA9myr+Q68PuzJxZHLJAYpUgAWWLAEKs1XJiRpLbWtgpNHOjGy0Kf57ej4k
y1+sy2VLM92v+73uHml0sL8HB7Dlc8blX2zOIMU5ClnmWBhIZIrW3vk8Zly7FchZDROEqUIUNaRc
G8UnlcEUWJFCLlM+5XRTFSkqMJmDAIMq1yCndgGubr/BDWpN+1dYoGIC7quJ4Anc8ASLBGPw8XQm
K5E6AIrYDbXgJnNYNiUyv55CLStIJRTSQMbmCIz2ynrovDXiUUvFfTJjyqPRaLFYxLOiiqWajay5
wKkZCZdbnJlctE7ntKn4LDMwlZZWowb9ZyhSeJa8MKTHpHZp8SLlc55WzCEAq0wmlXa8UGMgZkkJ
IkV8eAF/eiPikBB0bkug4zb+Je3kUllkSiBnhssC6GuDNcUrlXzGxLg6cKOB8Vx7/jZl6XJ0DhMu
n8nhV0HF5IIXM5fGEOZcc+MsMpkjlGzmq8fMbvUCllOwMaKL/b2SJS8EQLFnrRH9xxq/x1weWxOe
l1IZcFuV4SI+OF5dJPODruU2rI7PxnadT6R4z8Kjjw48tzFZCWptLqhfYZHxJKNeKlJqCNIgNB9p
VdD9VMmceowaeMKKF+uzHJamlFcKZ9Q5r0tNfJsngmnduN4rmVDDkTm+GixSDWP8XtkxOKu4oCG6
dOns7/23v9cLANpQAyQUtKDZGVNiVML7h7v7i4fHf5+uLm7PTm+/Pp2en1+Mx9d3t2M4gf7MR3ti
iY1G3aP7jnvPke/ZFmOJkaoOvBeKlTSBIRWY+Fy07VS2kboD8Cib3HuO/FrqXoImpNPddfwGeW8x
pHXFmeA/2MQK3bNi9ErF54wqst2HDjOBM9oP955uk0dYjHb40hB8Du4D6+Xi9dYQSdeO3bE1eVsJ
sgZO4/aCa2tRB16hqVQBBS42FI7WQsdboQZNDiGPRqBLZISMj1a8e6bsyTv1S/q4YxZq84XpbIym
MfmKtSamRSVE1/iGjgxYdtNxU5TlknUiKqdKsdpaRy69kJkXaIPmumJrdXARdFUS2e6Oo90wCmE3
OUf22N7MYS55CjR4TehoQFGVXGhwbhevCZb2uG3C8ylES5Kx5j+syx9w2KlkmEkCPoHI3ww6YsUz
NNGhT7tH1w2RaEBHmrllOUY83bX97eE66lOfHDXDOQqzfdSHX2CHH0tbfiQ26Wfq6J0TY9hJdtlU
a5grmm2XdVvYu4l7Sr0gTXW4njNR4U9kN6pe6mtr0NQ75kWoNZXh40f4EBF0TFGZ0FH/8rE/GLjK
BNe2T2z+raPXLEwvwW9idCFCeQPSDT0QqM6ORGyk32wgu6lawzjJmDq11YcPJ/AJPnngBrk3Gvl3
H/qyJoCpS/tUQvsu1FvDXKEfsFqw97iGknaNaP6Vacy6Oetqoj2rwyH89vsgppu8OW5aoBYnXHxm
hu2AIYyl5PSLQmNX4J83r2sf3zcdobtVa7u5KV4rS2cQCTpaQXlbZuV+EmboaRjRO1bbkfSg7oy6
a1p34lgbpegQw9ch9C+v//n7or9yIr/9D1BLAwQUAAAACAAYSz8qbhWSZU0MAABsJwAAEgAAAEdl
bmJhbmtGb3JtYXQuamF2Yd1abXMbtxH+fJnRf0D0ISZt5hQn0y+RlYaSaYWtIqkSnUyayXTAIyjC
uhfmgJPMNPnv3RcAhztSst20mU7lscgDFovFYvfZBzgdPN37SDxN8EeIY139Rd5JsVB3Kq/WhSqt
yKqFQhH8L2YrbRJsEYXciLlKlrVS+UYstLG1njdWLYQsF6KoFnqp1SJpyoWqhV2RCmFVXRhRLRNo
SE7PXydnyhjoP1WlqmUuLpt5rjNxpjNVZipNcDphVlWTL2j8XHVmutd2hbrJxlSI6VJsqkYsqqSs
bLKSd0pI6FtvRjTaKPWlX0mSrKxdf3lwcH9/n96UTVrVNwcomqulPcjJrHRlizws/QQ6a32zsmJZ
4YrAMnIEfK5UvhBvKl1acMV8QybpcqHv9KKRpEHIxq6q2qQJuFAZ5RaFC8phQeAoXYqvWQjsz0B1
gd43aZj/FfQUVa1AEgwopNVVmVQlTeb3bV1Xb1RmaQu0NULqwvDa0eSKbKQBc129gQGf5kkhda7L
mwTNGIk7bbSl3VlVhRJrecMbJ+1uzzk95D0vAF/w31pmtzA8ga4gBp+pUT+nujpEEV2sq9oK6oKe
p4fdpsbqnBpDc6wLe008pj/RI11mU8yr/FEJsJKnPnga/A9OF7WSGNEYAxC1x7K8FUsN8QI7e1KV
d6rGwFzWVUF+rmCPjYWgErrCDfCJUKp7DJT7qr4F57P8CwynrybFPD/Tt4qne3FAjSEIfIjMYHuk
ES+r+7LXLhfJ9yrPVtQ8XizAGm/nikyn+AFbaJdhmQ2mGkQMpAzFVIoRKsYnJ5Pr6+nFuRvF6Stv
MNxBEsI7yxtQnibXFoISwq0UXll6o+y5LNRgmHaMO63VTXJSvQ1BsuZ0z3JpDFo5Byudm2FbckUZ
IK6dWu4ZwXOtZa5/kfMcnPnPvY8S2qIEZroCWyHqgyXtRpi1ygiR/AY2Bh2PXdcUDBDrNUIRpA7r
8tOKVzKDJW4wuSChYcmCPBTmQIW4/KIxFhMavrq1sCLO1pQeuOXrWtmmLsW8qnIlSzGrGxi2RGtq
AhQJLobv7RwubTHSDsVS5kaxJhK71+4RnJo4n3rVaJxfyQC6k+S4WS5hmsUV+4HdMaIu9sQlOwJy
hL+5PvXz9OIMsQqgmkELvgyxz67q6t6IaZ6rG5mzksnbTK0xnEZietE+JKQyPMNo3L/E+QsSyKq3
UHLsW3FEOdLtGLQ2tRYcooJrqAmwnYBkyj8rWfiVuE9Q2eQ59Xv3rKQZs6/Ddh8JC9tBUn6O1FhZ
2+BGnvJ+BZshBgOcEwaxH1P8OIOWwXAoPuYJyUe0zAT2mORZofke6tdg/+Bgf0gyTgikBt7kWAX0
CyEODsQE0L1aCgYxsZBWcnfiRqVZXhlvZtscrz9JfsMExC/O8ELWt4PP/Rg0NFrRYCiOjsSnz70d
TvFO71F0tpPgh+J4jUYG3UbZYGgwKfhdlYue13Eo5c723NzvpoQQSiFhMyjktB3oddbgZqGgpRiL
4nOwf1krSNYG0lCRlxNDkcR4Pzk/Hp//dZ/0sJqAPESMPDZ67EJ4nFqo58oQI7mvtUWo3QB7QPCp
GJpoBsaHKIPvKr3gEcEFwcuAC5BKEPKWAz2pTCcTL7YyrF3uVVNaXahozSeyfGIFTYU0xC3T41aa
hgX/FtfDSZnJtWlyaXEJ+LsxyMkwLWh/kVVgcDIIy8RBcEajXWITorUc513lDSuIaCtIp3A4jOCK
sK71HZiULHUJtBLNA4+Cu8Q3k/HLyVVyJJ4fPib2ajKevb6aXIuj5PNHBa8nf3s9OT+ZgOAXkaDY
EvxucoXl9B9nk/PT2TeQJ8+fPyo/G59Gsp8TFQlmgECCso2JjdsN3/EsXViMHmI1CPIOXLrqGWNl
hkkFgbOjj7nCDGjCkdjf3xbg2tOKzRDtGeljAc71YPTOynPYzT/YfVhPg5UakgoIhJW3ylB++RHE
ircLvqcIrKfDQjo1G8QlCXbc3NUFGQ1JgNXaYY/nGBjmfT1hSeELDAeY0EsghitVwiA8VwDQAatQ
QaVatEhRVxboPlO8uE5ixnfDoWf3rpoemTRsYUMbx5aN26hxXcsNCjpEZhEKRpDg/Go7gvOPYv2H
HoNZe6hPPStj/VHgHrXBnZLjOawHHUrQw+dL5zwfEiUebtCZuFmOXWWOfkBtXdfqTlewIpQ0res5
IgjtkafhqEIBNC046BDpqRHsyYndozLSjVDViSeH1V1CBJqhteT9B36i6hoCusqypjYUZDsC6lH6
1VEY8UuhWdwlOqvqsEydwpG6HeHpRiTDaw4HgjaJZL5eyblCNoz2Nj5mffCT+2d+HyDqnX93BDZW
QREX8ojpxSWv68bRA97wYb2Th3nMd2yMqUoIbN/LLIODUt5hbR7D0WkQuAWzHbFzhour6en0fLd+
X0g+QL8f2xoXKXY+e6WIzWxzoJ16vBHbesCOoGPUKR2Pa2Q82Nb3DZWAHWZt5S0XgTjL1iGXpWB7
ILeaglMDI9K6iHV0iTVRrGmoEnhcg6wEXsK+4Z7KlyV6hDPelSqqO5gDCI1VZi3pkOziOK8yOisD
oS7mqjZUWlbwi7M/HNuI/jDN8zhcIczgNQF8ZWXu2sGX3L1tjPh/Se2o+Dl37PAKM2JU7r23y0GM
EswtYozwcRrBxKhLe8Lk7z677jm4yFay/vEnwHA+6lGA2OoEWqkYuuRE1ubKEaQ8yH3WbcZjGzdi
szs+RgNe8ARprsobu4pShkQHj4iKTz4hwSQZDKjjx1b2J8zCJ+LJUPz6qxMCMbRdAsDWqTYv9Y22
2+OGw+6pFA6eeGWAZdOKnPm9oLzg8HepwMLPnrWaGNH4XIbgEK3jq6PtNfv55rBnt2EsfUZujJQ8
Yy7fnsiDWNdN4CURrRIEfsLzNfqmM7GzHfq7s4MDTiEbEFdu6GYcQITufjrpTqIhxtLMO9qwh0eR
5aM2Lj6Nmt1BtxNKHYveAyZD3jmIbBFS2yemh30AdmNW00FEyP6FgnOJIn1z5VVB9t8rhhxfeOhm
0NFr1sSlDtvdddnx+HoiTi5en8+C9sCqBELRf4ccPYBKsJpddJqdGONJXD/71GM399hzoAHhcrJS
2a0IhBOWjLww3/T8j9J6uU0XBP502MIDtADnepXDFtzJvMG97hc3d1uNRwRHhKsa9uUYLPHjAXNl
eYMkWJeZXmA8jwS95VjJ9VqVxrmN8FkWNAt+n3x7fOZ10PT00DkIvprtH3ZbHzr3sQuaOReAwZ+G
PXLRegJmu7Du7nMUgh0qPy0+quScS7+PdLwzm7rMYRzuzFve8WAuwdao7VQCj0NQxRX5fyY9OuU2
8uiHZceueD+7OHndZd+wy2gwdQRHSr7MB4KTSYMuzCQdui0d4KQujYMhGg9UrTHAWPUvECdFlaus
QXqzWcOjfzGCb+oMHRv8FQEgedAAMJ9jzeN3mkz80kfTkV0xq25VCdNyXNCT6QR9EIiisZM57bi0
hIyhr4O+3GO5tD146OjHg7Z3Mvd6+vfJh+Tu751v9sMlzCci6NTm0L1qFvwq+fLq4tLvTMtHnszX
T36vmX/UIl9Ov5viVSAv9Nudi/tDDPn25fiDkPkht8Vnv+2sdhef/bx2zR+S1fPKrmJY8Cr4nsyn
7unUUdH3d4Q38d+rUjQoSdoL21F8efusd/M7TGFUESIOTwdMT/0NFbDbLY1bWqChY6ajuIMQAA6N
8W2GLuGbe3aHl9b43tzOqt4wZ7KzeDnod8fbfTr9svc666Et6O7B6ZTz4bIyRs+BI7V/IfF6+vLP
vRG8Nc5PydYObRkY1vuFd/xv/cgVg5V09QwmYEjmkKUyAo/4visJxR7q9PtGWNJz+mdxhAw73u2t
EJfWuTvvB1+kZxdXQtNvlVonxOHoyIb8gq45d7A1CiUWBfq5D1H2PtP12ZMM96zRsYDndK+8wgu+
vf88qdmiKp29QZvfQUzE4OPgkBSOdjI3g30PXxiiqDAsDf/Mht87JdGLS7kILy4ht9aqthuPE0H3
qJfCtuKN3q5G4W3Ke4DT/v6DG+P/7IDeR4LPKCw9vIo4sN019w/KjNAhstz4e2x/8I+wqT3iJrJW
+CpR8r3A1na4t+5JJ9HY+sTzRtoE/3pet3KQC+2bZXchkz98GYMnngGCq6Y7F/h4EdkMz8+edd7L
D/IfdbgO2IvQq2tC+NOA/v2E++U8HA063Gtfnf4LUEsBAhQAFAAAAAgAVkw/Khd6SNrOBAAAngsA
ABYAAAAAAAAAAQAgALaBAAAAAEdlbmJhbmtUb0dmZkZhc3RhLmphdmFQSwECFAAUAAAACACzXD4q
AKroJf0BAABFBAAAEAAAAAAAAAABACAAtoECBQAAVGVzdEdlbmJhbmsuamF2YVBLAQIUABQAAAAI
ADJKPyqoNqx5hgQAAIELAAAVAAAAAAAAAAEAIAC2gS0HAABHZW5iYW5rUHJvY2Vzc29yLmphdmFQ
SwECFAAUAAAACAAYSz8qbhWSZU0MAABsJwAAEgAAAAAAAAABACAAtoHmCwAAR2VuYmFua0Zvcm1h
dC5qYXZhUEsFBgAAAAAEAAQABQEAAGMYAAAAAA==

------_=_NextPart_000_01C08B95.8D35B8B0--