[Biojava-l] Biojava XML Binding (BJXB)

Schreiber, Mark mark.schreiber@agresearch.co.nz
Tue, 7 May 2002 09:05:51 +1200


Actually I do want it very biojava bound at one end (hence the name)
with the possibility of not being biojava bound at the other. If it was
biojava bound at both ends then RMI would be fine (although fire walls
and serialization might be a problem).

Nested features spring to mind as an example of something that is not
handled well by other file formats. Also if you do the GenBank ->
Biojava -> GenBank round trip you might be surprised by what goes
astray.

- Mark

> -----Original Message-----
> From: Emig, Robin [mailto:Robin.Emig@maxygen.com] 
> Sent: Tuesday, 7 May 2002 4:04 a.m.
> To: Schreiber, Mark; biojava-l@biojava.org
> Subject: RE: [Biojava-l] Biojava XML Binding (BJXB)
> 
> 
> Although we have had similiar problems, I'd like to know what 
> information you need is lost exporting to the mentioned file 
> formats? For the most part you can recover what you need. Esp 
> if you REALLY mean you don't want to be held to biojava/java 
> on either end of the process. I'd just hate to create YAF 
> (Yet another format), instead of modifying/using one that 
> already exists, and creating extra work trying to make it 
> "not biojava bound" yet "containing biojava info" -Robin
> 
> 	-----Original Message----- 
> 	From: Schreiber, Mark [mailto:mark.schreiber@agresearch.co.nz] 
> 	Sent: Sun 5/5/2002 9:05 PM 
> 	To: biojava-l@biojava.org 
> 	Cc: 
> 	Subject: [Biojava-l] Biojava XML Binding (BJXB)
> 	
> 	
> 
> 	Hi -
> 	
> 	I would like to propose/ formalise a schema for binding 
> biojava objects
> 	esp sequence objects to XML. The current binding of 
> Biojava objects to
> 	other formats such as GFF, GenBank, EMBL, Game, Agave 
> is inadequate as
> 	details are lost in the reading and writing of these 
> objects. While it
> 	is useful for biojava to read and write these objects 
> the only way to
> 	currently capture everything about a biojava is to 
> serialize it as a
> 	binary stream. The advantage of serializing to an XML 
> document is that
> 	the XML can be constructed and edited using a text 
> editor or programatic
> 	processes on a machine (possibly a legacy system) with 
> no Biojava
> 	installation and no requirement for a JVM. Also the XML 
> can be ported
> 	via HTTP/ Soap. The DTD could also be used as a base 
> for anyone who
> 	needs a richer schema that maps well to Biojava.
> 	
> 	Why not use JAXB? Two reasons, JAXB requires java at 
> both ends of the
> 	serialization / deserialization proceedure. JAXB 
> doesn't play well with
> 	many biojava objetcs due to their use of factory 
> methods, private and
> 	protected constructors and singleton Alphabets. 
> Actually this was all
> 	inspired by my inability to get JAXB to work with biojava.
> 	
> 	I have included a demo xml file and a simple dtd. 
> Obviously there is a
> 	lot of room for expansion of the DTD to include more 
> biojava concepts
> 	however I thought I would start with a typical use with 
> a rather nasty
> 	feature structure. Currently there is no read or write 
> ability but StAX
> 	looks like an obvious choice, I suspect there might be 
> a need for a lot
> 	of reflection code in the handlers! I am no StAX expert 
> so if someone
> 	feels particularly inspired in the next 24hours to 
> knock out a quick
> 	handler that would be cool.
> 	
> 	Comments and Flames welcome.
> 	
> 	<?xml version="1.0" encoding="UTF-8"?>
> 	<!DOCTYPE seq_db SYSTEM "bjxb.dtd">
> 	
> 	<seq_db class="org.biojava.bio.seq.db.HashSequenceDB">
> 	  <sequence class="org.biojava.bio.seq.impl.SimpleSequence">
> 	    <id name="fooase_est" urn="embl:UA000933"/>
> 	    <symbol_list class="org.biojava.bio.seq.SimpleSymbolList"
> 	alphabet="DNA">
> 	
> 	
> accggtatgaccagaggacccatatagggacaaaccaaaaaaaaagcccacagcgcgttgagacagg
> 	      
> gggacacacccatatttaagaggacaccaaccccccccaaagagagagatnaaaaanaaana
> 	    </symbol_list>
> 	    <annotation class="org.biojava.bio.SimpleAnnotation">
> 	      <entry key="organism" value="Homo Sapiens"/>
> 	      <entry key="seq_type" value="EST"/>
> 	      <entry key="date" value="19/11/2001"/>
> 	    </annotation>
> 	    <feature_holder>
> 	      <feature 
> class="org.biojava.bio.seq.genomic.TranslatedRegion"
> 	               source="auto translation"
> 	               type="predicted peptide">
> 	        <annotation 
> class="org.biojava.bio.Annotation.EmptyAnnotation"/>
> 	        <location value="[7..28]"/>
> 	        <sequence 
> class="org.biojava.bio.seq.impl.SimpleSequence">
> 	          <id name="fooase"/>
> 	            <symbol_list 
> class="org.biojava.bio.seq.SimpleSymbolList"
> 	alphabet="PROTEIN">
> 	              MTRGPI*
> 	            </symbol_list>
> 	            <annotation
> 	class="org.biojava.bio.Annotation.EmptyAnnotation"/>
> 	        </sequence>
> 	        <feature class="org.biojava.bio.seq.impl.SimpleFeature"
> 	                 source="experimental evidence"
> 	                 type="SNP">
> 	          <annotation class="org.biojava.bio.SimpleAnnotation">
> 	            <entry key="SNP_type" value="g:c"/>
> 	          </annotation>
> 	          <location value="14"/>
> 	        </feature>
> 	      </feature>
> 	      <feature class="org.biojava.bio.seq.SimpleFeature"
> 	               source="experimental"
> 	               type="PolyA tail">
> 	         <annotation
> 	class="org.biojava.bio.Annotation.EmptyAnnotation"/>
> 	         <location value="[119..131]"/>
> 	      </feature>
> 	    </feature_holder>
> 	  </sequence>
> 	</seq_db>
> 	
> 	<?xml version="1.0" encoding="UTF-8" ?>
> 	<!ELEMENT id EMPTY >
> 	<!ATTLIST id urn NMTOKEN #IMPLIED >
> 	<!ATTLIST id name NMTOKEN #REQUIRED >
> 	
> 	<!ELEMENT feature_holder ( feature* ) >
> 	
> 	<!ELEMENT annotation ( entry* ) >
> 	<!ATTLIST annotation class NMTOKEN #REQUIRED >
> 	
> 	<!ELEMENT sequence ( id, symbol_list, annotation, 
> feature_holder? ) >
> 	<!ATTLIST sequence class NMTOKEN #REQUIRED >
> 	
> 	<!ELEMENT seq_db ( sequence+ ) >
> 	<!ATTLIST seq_db class NMTOKEN #REQUIRED >
> 	
> 	<!ELEMENT symbol_list ( #PCDATA ) >
> 	<!ATTLIST symbol_list class NMTOKEN #REQUIRED >
> 	<!ATTLIST symbol_list alphabet NMTOKEN #REQUIRED >
> 	
> 	<!ELEMENT location EMPTY >
> 	<!ATTLIST location value CDATA #REQUIRED >
> 	
> 	<!ELEMENT entry EMPTY >
> 	<!ATTLIST entry key NMTOKEN #REQUIRED >
> 	<!ATTLIST entry value CDATA #REQUIRED >
> 	
> 	<!ELEMENT feature ( annotation, location, sequence?, 
> feature? ) >
> 	<!ATTLIST feature type CDATA #REQUIRED >
> 	<!ATTLIST feature source CDATA #REQUIRED >
> 	<!ATTLIST feature class NMTOKEN #REQUIRED >
> 	
> 	
> 	Mark Schreiber
> 	Bioinformatics
> 	AgResearch Invermay
> 	PO Box 50034
> 	Mosgiel
> 	New Zealand
> 	
> 	PH:   +64 3 489 9175
> 	FAX:  +64 3 489 3739
> 	
> 	
> 	
> ==============================================================
> =========
> 	Attention: The information contained in this message 
> and/or attachments
> 	from AgResearch Limited is intended only for the 
> persons or entities
> 	to which it is addressed and may contain confidential 
> and/or privileged
> 	material. Any review, retransmission, dissemination or 
> other use of, or
> 	taking of any action in reliance upon, this information 
> by persons or
> 	entities other than the intended recipients is 
> prohibited by AgResearch
> 	Limited. If you have received this message in error, 
> please notify the
> 	sender immediately.
> 	
> ==============================================================
> =========
> 	_______________________________________________
> 	Biojava-l mailing list  -  Biojava-l@biojava.org
> 	http://biojava.org/mailman/listinfo/biojava-l
> 	
> 
> 
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================