Bioperl: XML/BioPerl
beavis@netdirect.net
beavis@netdirect.net
Wed, 20 Jan 1999 21:54:54 +0000
In response to David States comments, I would be happy to serve as a
facilitator for anyone interested in BIOML and as the keeper of the
BIOML DTD. The "official" copy will be at:
http://www.proteometrics.com/BIOML/bioml.dtd
The "official" home page will be at
http://www.proteometrics.com/BIOML/index.html (in HTML)
and
http://www.proteometrics.com/BIOML/home.bml (in BIOML)
If you would like my assistance in writing any type of application
that uses BIOML, expanding the data model, or any other related
question, feel free to contact me at
beavis@netdirect.net
****************
I'd also like to join in (IMHO) to the discussion of extending
biochemistry-related XMLs. My personal philosophy regarding DTDs is
that they should be as small as possible. Only the attributes and
entities that are necessary to fulfil the basic data model
design should be there. The DTD format is difficult for most humans
to read, so simplicity is a great virtue.
The length and complexity of the DTD will be affected by the data
model that is used. I would like to hope that is in a field as
littered with details as biology, the DTD designer will attempt to
abstract the object that is being described to as a high degree as
possible. An aid to the process is to use element and attribute names
that are very descriptive, distinctive and slightly verbose.
Clearly, a DTD that contains a minimum set of elements will need
extending to fit some particular needs. The conventional mechanism
for extending an XML-adding <ELEMENT>, <ATTRIBUTE> and <ENTITY> tags
to an extended document-should be used. Otherwise it is simply not an
XML: it is just another data representation.
Such a strict interpretation has one main drawback: very few people
are ever going to write a validating browser. Without one, the
biological user (who could care less about the elegance of a markup
language) may end up with a bunch of files that cause errors, will
not load, or will produce nonsense on the screen when they try to use
the data. Even though HTML is not very strict, the current set of
browsers are very good at recovering from syntax errors and
displaying as much information as they can salvage from the HTML
code. I think that XML browsers should be written in an analogous
manner. If you would like to extend the XML for use with your
application, you should be sure that the code you should try not to
replace the elements, although you can add as many attributes to them
as you like, in addition to the core attributes. With the core set of
elements and attributes, any browser should at least be able to
produce a good "default" display, which may not take advantage of
the extensions but which can still convey meaning to the user.
I think that there is room for a great number of XMLs to
describe biologically-relavent things. BSML was written to describe
graphics related to molecular biology; BIOML was written to annotate
organisms and biopolymers; and the latest entry BlastXML was written
to describe documents generated by Blast. I hope that they can all
live happily together in the same document without too much fuss. I'd
like to suggest using the convention that is currently used in HTML
to include other programming languages, e.g., a document that
includes HTML code, BIOML code and BlastXML code would look like:
<html>
html code
<script language="bioml">
<bioml>
bioml code
</bioml>
</script>
<script language="blastxml">
<blastxml>
blastxml code
</blastxml>
</script>
more html code
</html>
This simple format allows different browsers to read the same
document in their own special way, ignoring what they don't
understand, allowing the XMLs to easily "add value" to the wealth of
HTML code already extant. I have experimented with this convention
using the combination of my BIOML browser and Netscape (or IE), and
it is possible to do some very nice things this way. The HTML-version
of the home page mentioned above has a copy of the BIOML-version of
the homepage embedded using this mechanism.
Ronald Beavis
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================