Bioperl: XML
Vicki Brown
vlb@deltagen.com
Thu, 6 May 1999 10:43:06 -0700
The BioPerl list hasn't mentioned XML since January... The message below
was forwarded to me. What is the current view/status in the BioPerl
community as regards XML? There was talk of a BoulderIO <-> XML convertros
as well as a CGI <-> XML converter.
I can't agree with the assertion that XML will result in
"(No more perl-parsers for >BLAST-output!!)" But I thought this was worthy
of bringing up on the BioPerl list.
With the permission of Mr. Loeffler:
-----Original Message-----
>From: Gerald Loeffler <Gerald.Loeffler@vienna.at>
>To: Computational Chemistry Mailing List <chemistry@infomeister.osc.edu>
>Date: Friday, April 30, 1999 4:00 AM
>Subject: CCL:XML for Bioinformtics Data
>Hi!
>
>Recently, I've been working a lot with XML (see http://www.w3c.org/xml/
>and e.g. http://www.ibm.com/xml/), which is a standard, human-readable,
>extensible markup-language that is rapidly becoming _the_ method of
>choice for exchange and storage of any kind of data and documents. It
>seems to me that XML would simply be _perfect_ for data exchange and
>maybe even data storage in bioinformatics (see end of message for a note
>on chemistry and CML).
>
>E.g. (from the top of my head), a DNA/protein sequence similarity search
>engine (e.g. NCBIs BLAST server) might return its search results in the
>form of an XML document that
>could look like this:
>
><seq-sim-search-results>
> <query>
> <type> protein </type>
> <seq name="My stupid peptide"> GAVLIFYWSTQ </seq>
> <algorithm> FASTA3 </algorithm>
> <db> SwissProt </db>
> <gap-open> -12 </gap-open>
> <gap-extension> -2 </gap-extension>
> </query>
> <hits>
> <hit>
> <accession> HPS_HUMAN </accession>
> <organism> homo sapiens </organism>
> <overlap> 11 </overlap>
> <overlaping-seq> GAEVLFYWTDQ </overlaping-seq>
> <z-score> 129.3 </z-score>
> </hit>
> <hit>
> <accession> PA24_MOUSE </accession>
> <organism> mus musculus </organism>
> <overlap> 8 </overlap>
> <overlaping-seq> VFIFYWTT </overlaping-seq>
> <z-score> 133.3 </z-score>
> </hit>
> </hits>
></seq-sim-search-results>
>
>There are several important points here:
>
>1) Without knowing what this XML document is about, a program can assert
>that it is well-formed! These programs exist, are free and are
>applicable to all XML documents!
>
>2) The rules for the nesting and naming of the tags in XML documents of
>this type can be formally defined in XML. The above document would be of
>type "seq-sim-search-results" and you could easily write a formal
>definition (in a DTD file) that says that such a document must contain a
>"query" and a "hits" tag; the "query" tag in turn must contain exactly
>one of each "type", "seq", ... The "hits" tag in turn may contain 0 or
>more "hit" tags which in turn ...
>
>3) Having a formal definition of documents of this type, a program can
>verify that our above XML document complies with the formal definiton
>(is valid). These programs exist, are free and are applicable to all XML
>documents!
>
>4) Free utilities exist (e.g. IBMs xml4j) that can programmatically
>write and read (parse) any XML document and thus give a program access
>to the structure and content of the document!! (No more perl-parsers for
>BLAST-output!!)
>
>5) This file is human-readable! (in contrast to a Corba struct or a
>serialized Java object!)
>
>6) Modern WWW-browsers can (if a style-sheet is supplied) directly
>display this XML document. For old browsers, the XML document can easily
>be converted to HTML for display.
>
>I think you get the idea.
>
>Does such an XML-based approach sound reasonable?
>What does this approach leave to be desired?
>Are efforts underway in this direction?
>Wouldn't it be a better world if we all used XML (-:
>
>I know that XML is currently being used for chemistry-related data (CML,
>see http://www.xml-cml.org/), but I haven't heard of any efforts in the
>area of Bioinformatics. So please view this message as targeted towards
>the Bioinformatics community that is not served by CML. (CML has a
>DNA/protein sequence tag.)
>
> cheers,
> gerald
> cheers,
> gerald
>--
> Gerald Loeffler
> Email: Gerald.Loeffler@vienna.at
> Smail: Apollo Imaging, Marchettigasse 7, A-1060 Vienna, Austria
> Phone: +43 676 3289588 (+43 1 5952333 27)
> Fax: +43 1 5952333 20
> Keywords: Java, CORBA, OOA&D, Databases, Bioinformatics,
> Computational Biology, Computational Biophysics
-----
//=\ Vicki Brown <vlb@deltagen.com>
\=// Journeyman Sourcerer: Scripts & Philtres
//=\ (Mac)Perl, awk, sed, *sh..., occasional C
\=// A little web-gardening on the weekends
//=\
\=// Deltagen, Inc.
//=\ 1031 Bing St, San Carlos, CA 94070
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================