[Bioperl-l] xml standard for sequences
Matthew Pocock
matthew_pocock@yahoo.co.uk
Tue, 16 Apr 2002 19:20:35 +0100
Paul Gordon wrote:
> it could be a
> discussion and DTD (much needed!) repository for the community, instead of
> trying to come up with a common syntax.
Couldn't agree more. By far the most usefull thing we could do is
generate xml-schema, rdf/daml or (heaven forbid) DTD fragments for
common non-contentious biological concepts (like standardising strand or
frame) and let people compose their documents from these common
elements/attributes plus their own glue. The next generation of xml
parsers and DOMs will be able to expose the grammer validating elements
and attributes, so you will be able to do some really funkey binding of
xml data to factories, objects and code. Small, non-contentious
definitions would be a good place to start.
Schema has the benefit that we can define data-types and validation
rules without naming elements or attributes, and in some cases without
distinguishing between rules for validating attributes and text content
of elements. This allows a concept like strand to be used in the way an
author feels most comfortable with. They could say <strand>+</strand> or
<feature str="+"/> or <hit strand1="+" strand2="+"/> and have all the
apropreate text validated against the same schema definition. This sort
of thing can't be done with DTDs. This isn't rocket science and it
doesn't force everyone into adopting a single world view. It just might
work. I'm all for letting everyone agree to disagree, and if a computer
can translate between these different formalizms well then great.
Matthew