Bioperl: XML/BioPerl

Wed, 20 Jan 1999 21:54:54 +0000

In response to David States comments, I would be happy to serve as a 
facilitator for anyone interested in BIOML and as the keeper of the 
BIOML DTD. The "official" copy will be at:

http://www.proteometrics.com/BIOML/bioml.dtd

The "official" home page will be at

http://www.proteometrics.com/BIOML/index.html (in HTML)
    and
http://www.proteometrics.com/BIOML/home.bml (in BIOML)

If you would like my assistance in writing any type of application 
that uses BIOML, expanding the data model, or any other related 
question, feel free to contact me at

beavis@netdirect.net

 ****************

I'd also like to join in (IMHO) to the discussion of extending 
biochemistry-related XMLs.  My personal philosophy regarding DTDs is 
that they should be as small as possible. Only the attributes and 
entities that are necessary to fulfil the basic data model 
design should be there. The DTD format is difficult for most humans 
to read, so simplicity is a great virtue. 

The length and complexity of the DTD will be affected by the data 
model that is used. I would like to hope that is in a field as 
littered with details as biology, the DTD designer will attempt to 
abstract the object that is being described to as a high degree as 
possible. An aid to the process is to use element and attribute names 
that are very descriptive, distinctive and slightly verbose.
Clearly, a DTD that contains a minimum set of elements will need 
extending to fit some particular needs. The conventional mechanism 
for extending an XML-adding <ELEMENT>,  <ATTRIBUTE> and <ENTITY> tags 
to an extended document-should be used. Otherwise it is simply not an 
XML: it is just another data representation.

Such a strict interpretation has one main drawback: very few people 
are ever going to write a validating browser. Without one, the 
biological user (who could care less about the elegance of a markup 
language) may end up with a bunch of files that cause errors, will 
not load, or will produce nonsense on the screen when they try to use 
the data. Even though HTML is not very strict, the current set of 
browsers are very good at recovering from syntax errors and 
displaying as much information as they can salvage from the HTML 
code. I think that XML browsers should be written in an analogous
manner. If you would like to extend the XML for use with your 
application, you should be sure that the code you should try not to 
replace the elements, although you can add as many attributes to them 
as you like, in addition to the core attributes. With the core set of 
elements and attributes, any browser should at least be able to 
produce a good "default" display, which may not take advantage of 
the extensions but which can still convey meaning to the user. 

I think that there is room for a great number of XMLs to 
describe biologically-relavent things. BSML was written to describe 
graphics related to molecular biology; BIOML was written to annotate 
organisms and biopolymers; and the latest entry BlastXML was written 
to describe documents generated by Blast. I hope that they can all 
live happily together in the same document without too much fuss. I'd 
like to suggest using the convention that is currently used in HTML 
to include other programming languages, e.g., a document that 
includes HTML code, BIOML code and BlastXML code would look like:

<html>
  html code
  <script language="bioml">
    <bioml>
        bioml code
   </bioml>
  </script>
  <script language="blastxml">
    <blastxml>   
       blastxml code
   </blastxml>
   </script>
   more html code
</html>

This simple format allows different browsers to read the same 
document in their own special way, ignoring what they don't 
understand, allowing the XMLs to easily "add value" to the wealth of 
HTML code  already extant. I have experimented with this convention 
using the combination of my BIOML browser and Netscape (or IE), and 
it is possible to do some very nice things this way. The HTML-version 
of the home page mentioned above has a copy of the BIOML-version of 
the homepage embedded using this mechanism.

Ronald Beavis
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================