Bioperl: Re: BioML/XML

Ronald Beavis beavis@proteometrics.com
Mon, 2 Aug 1999 16:21:41 -0500


The XML thread in this group has not been very active recently, but I think
that this community is still interested in the subject, so I am posting this
message here.

We have recently completed the development of an open and freely-available
source of BIOML (BIOpolymer Markup Language) content, for use by the
bioinformatics community interested in proteins and proteomes. We are
calling it the Proteome Template Library and it is available at

ftp://204.112.55.140, or at
http://204.112.55.140/PTL. (mime-type=text/xml)

It consists of approximately 4900 files. Each file represents all of the
protein sequence information, database references and additional information
about a particular species' proteome found in the Protein Identification
Resource (PIR) and NRL3D databases, represented in BIOML. For example, the
BIOML file for Homo sapiens contains all of the protein sequences and
database references for human sequences (~ 15 MB). Any organism with more
than twenty protein sequences in a file has the sequences arranged into a
set of categories, describing either its function, tissue specificity or
subcellular location (if known).

The library is organised around the NCBI organism taxonomy, so individual
organisms can be found by following the directory structure through to the
appropriate classification. For example, to find "Canis familiaris.bml"
(dog), you look in "Eukaryota", then "Metazoa", then "Chordate", and finally
"Mammalia". If you were not trained in biology, you can search the site
using either a GET or POST request by CGI. For example, to find all proteome
entries matching dog and/or wolf, a request would be made to
http://204.112.55.140/cgi-bin/PtlIndex.exe?words=dog+wolf. The search finds
complete word matches and partial matches and it is case-insensitive. For
example, using just "strep" as the query word will find all of the
Streptococcus and Streptomyces species (as well as several other things).
The reply to the query is in BIOML.

The complete library is approximately 152 MB. We currently have not packaged
the entire library for download: we will make this available once we are
sure that the library is completely accurate and free of XML bugs. This
release should be considered to be a beta-release: all reports of bugs or
parsing difficulties should be made to bioml@proteometrics.com. These
comments and reports will be posted on a bulletin board, that will be made
available from the main BIOML site (http://204.112.55.140/BIOML). We take
compatibility and parsing difficulties very seriously.

The BIOML files in the library can be displayed as is, using Internet
Explorer (v. 5). Explorer doesn't deal with large files (> 1 megabyte)  very
well, so not all of the files can be viewed in this way. The browser
available at the website can deal with the large files, but it can only be
used on Windows 95, 98 or NT platforms that have IE 4.01 or later installed
on them.

The standard version of the BIOML DTD has been moved to
http://204.112.55.140/BIOML/bioml.dtd,
because our corporate server was receiving too much BIOML traffic. This new
server is mainly reserved for BIOML. Revised documentation and  information
can be obtained at  http://204.112.55.140/BIOML. The new and considerably
improved version of the BIOML browser and BIOML Visual Editor for Windows
95, 98 and NT can be obtained from this site, as well as an open source
version of all of the code used to create the browser.

Ron Beavis
Proteometrics


=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================