[Bioperl-l] Fwd: [Biojava-l] new NCBI blast outputs XML

Chris Dagdigian cdagdigian@genetics.com
Fri, 11 Aug 2000 12:12:43 -0400


This is a MIME message. If you are reading this text, you may want to 
consider changing to a mail reader or gateway that understands how to 
properly handle MIME multipart messages.

--=_EFB74F0C.4A2B4639
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline



Gerald on the biojava-l list is reporting that NCBI web-blast is now
able to output XML. Does anyone have any more info on this? Especially
on any NCBI plans for the command line blastall to have the same feature?


-Chris





--=_EFB74F0C.4A2B4639
Content-Type: message/rfc822

Received: from bugsun.genetics.com
	([199.93.104.10])
	by ce08a08h.genetics.com; Fri, 11 Aug 2000 12:03:46 -0400
Received: from genetics-cp.genetics.com (gi.genetics.com [199.93.107.129])
	by bugsun.genetics.com (8.8.8/8.8.8) with SMTP id LAA21580
	for <cdagdigian@genetics.com>; Fri, 11 Aug 2000 11:59:34 -0400 (EDT)
Received: (from dag@localhost)
	by fedayi.sonsorol.org (8.9.2/8.9.2) id KAA10235
	for cdagdigian@genetics.com; Fri, 11 Aug 2000 10:22:00 -0500 (EST)
Received: from pw600a.bioperl.org (pw600a.bioperl.org [199.93.107.70])
	by fedayi.sonsorol.org (8.9.2/8.9.2) with ESMTP id JAA10188
	for <dag@sonsorol.org>; Fri, 11 Aug 2000 09:42:55 -0500 (EST)
Received: from pw600a.bioperl.org (localhost [127.0.0.1])
	by pw600a.bioperl.org (8.9.3/8.9.3) with ESMTP id JAA18280;
	Fri, 11 Aug 2000 09:56:07 -0400
Received: from mailix.TELE.NET (mailix.tele.net [194.208.240.150])
	by pw600a.bioperl.org (8.9.3/8.9.3) with ESMTP id JAA18264
	for <Biojava-l@biojava.org>; Fri, 11 Aug 2000 09:55:44 -0400
Received: from www.imp.univie.ac.at (www.imp.univie.ac.at [131.130.80.2])
	by mailix.TELE.NET (8.9.3/8.9.3) with SMTP id QAA25328
	for <Biojava-l@biojava.org>; Fri, 11 Aug 2000 16:59:27 +0200 (MET DST)
Message-ID: <399413B4.D5836DEA@vienna.at>
Received: from miro.imp.univie.ac.at by www.imp.univie.ac.at
          via smtpd (for mailix.tele.net [194.208.240.150]) with SMTP; 11 Aug 2000 15:00:42 UT
Date: Fri, 11 Aug 2000 16:54:44 +0200
From: Gerald Loeffler <Gerald.Loeffler@vienna.at>
Reply-To: Gerald.Loeffler@vienna.at
X-Mailer: Mozilla 4.7 [en] (WinNT; I)
X-Accept-Language: en
To: BioJava Mailing List <Biojava-l@biojava.org>
Subject: [Biojava-l] new balst outputs XML
Sender: dag@fedayi.sonsorol.org
Errors-To: biojava-l-admin@biojava.org
X-BeenThere: biojava-l@biojava.org
X-Mailman-Version: 2.0beta2
Precedence: bulk
List-Id: Biojava discussion list <biojava-l.biojava.org>
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="=_EFB74F0C.4829443B"

--=_EFB74F0C.4829443B
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

hi!

The new NCBI-web-blast-release (available since yesterday for download
only for now) can output XML according to the attached DTD. I expect
that this feature will be available for the command-line version of
blast soon. The benefits of this i think are obvious...

We should discuss what this means to our own blast-parsing efforts. I
will definitely build java support (rather sooner than later) for this
type of XML output and would contribute it to biojava if there is
interest...

Another question (to Simon) is, how similar the NCBI DTD is to the DTD
used by the SAX-type-blast-parser.

	cheers,
	gerald
--=20
   Gerald.Loeffler@vienna.at _________________ Software Architect
   http://www.imp.univie.ac.at ____ http://www.daemonstration.com
   OOA&D, Java, J2EE, JSP, Servlets, JavaBeans, ODBMS, RDBMS, XML

--=_EFB74F0C.4829443B
Content-Type: text/plain
Content-Disposition: attachment; filename="blstxml.dtd"

<!-- ============================================ -->
<!-- This section mapped from ASN.1 module NCBI-BlastOutput -->
 
<!-- ============================================ -->
<!-- Definition of BlastOutput -->
 
 
<!--
**********************************************************************
 
  ASN.1 for simplified BLAST output in XML
  by James Ostell and Yuri Wolf, 2000
 
**********************************************************************
 -->
<!ELEMENT BlastOutput ( 
               BlastOutput_program ,
               BlastOutput_version ,
               BlastOutput_reference ,
               BlastOutput_db ,
               BlastOutput_query-ID ,
               BlastOutput_query-def ,
               BlastOutput_query-len ,
               BlastOutput_query-seq? ,
               BlastOutput_iter-num? ,
               BlastOutput_hits ,
               BlastOutput_param ,
               BlastOutput_stat ,
               BlastOutput_message? )>
 
 
<!-- 
 BLAST program: blastp, tblastx etc.
 -->
<!ELEMENT BlastOutput_program ( #PCDATA )>
 
<!-- 
 Program version: 2.0.13 [May-26-2000]
 -->
<!ELEMENT BlastOutput_version ( #PCDATA )>
 
<!-- 
 Steven, David, Tom and others
 -->
<!ELEMENT BlastOutput_reference ( #PCDATA )>
 
<!-- 
 BLAST Database name
 -->
<!ELEMENT BlastOutput_db ( #PCDATA )>
 
<!-- 
 SeqId of query
 -->
<!ELEMENT BlastOutput_query-ID ( #PCDATA )>
 
<!-- 
 Definition line of query
 -->
<!ELEMENT BlastOutput_query-def ( #PCDATA )>
 
<!-- 
 length of query sequence
 -->
<!ELEMENT BlastOutput_query-len ( %INTEGER; )>
 
<!-- 
 query sequence itself
 -->
<!ELEMENT BlastOutput_query-seq ( #PCDATA )>
 
<!-- 
 iteration number
 -->
<!ELEMENT BlastOutput_iter-num ( %INTEGER; )>
 
<!-- 
 Hits one for every db sequence
 -->
<!ELEMENT BlastOutput_hits ( Hit+ )>
 
<!-- 
 search parameters
 -->
<!ELEMENT BlastOutput_param ( Parameters )>
 
<!-- 
 search statistics            
 -->
<!ELEMENT BlastOutput_stat ( Statistics )>
 
<!-- 
 Some (error?) information
 -->
<!ELEMENT BlastOutput_message ( #PCDATA )>
 
 
 
 
 
<!-- Definition of Parameters -->
 
<!ELEMENT Parameters ( 
               Parameters_matrix? ,
               Parameters_expect ,
               Parameters_include? ,
               Parameters_sc-match? ,
               Parameters_sc-mismatch? ,
               Parameters_gap-open ,
               Parameters_gap-extend ,
               Parameters_filter? ,
               Parameters_pattern? ,
               Parameters_entrez-query? )>
 
 
<!-- 
 Matrix used (-M)
 -->
<!ELEMENT Parameters_matrix ( #PCDATA )>
 
<!-- 
 Expectation threshold (-e)
 -->
<!ELEMENT Parameters_expect ( %REAL; )>
 
<!-- 
 Inclusion threshold (-h)
 -->
<!ELEMENT Parameters_include ( %REAL; )>
 
<!-- 
 match score for NT (-r)
 -->
<!ELEMENT Parameters_sc-match ( %INTEGER; )>
 
<!-- 
 mismatch score for NT (-q)
 -->
<!ELEMENT Parameters_sc-mismatch ( %INTEGER; )>
 
<!-- 
 Gap opening cost (-G)
 -->
<!ELEMENT Parameters_gap-open ( %INTEGER; )>
 
<!-- 
 Gap extension cost (-E)
 -->
<!ELEMENT Parameters_gap-extend ( %INTEGER; )>
 
<!-- 
 Filtering options (-F)
 -->
<!ELEMENT Parameters_filter ( #PCDATA )>
 
<!-- 
 PHI-BLAST pattern
 -->
<!ELEMENT Parameters_pattern ( #PCDATA )>
 
<!-- 
 Limit of request to Entrez query
 -->
<!ELEMENT Parameters_entrez-query ( #PCDATA )>
 
 
<!-- Definition of Statistics -->
 
<!ELEMENT Statistics ( 
               Statistics_db-num ,
               Statistics_db-len ,
               Statistics_hsp-len ,
               Statistics_eff-space ,
               Statistics_kappa ,
               Statistics_lambda ,
               Statistics_enthropy )>
 
 
<!-- 
 Number of sequences in BLAST db
 -->
<!ELEMENT Statistics_db-num ( %INTEGER; )>
 
<!-- 
 Length of BLAST db
 -->
<!ELEMENT Statistics_db-len ( %INTEGER; )>
 
<!-- 
 Effective HSP length
 -->
<!ELEMENT Statistics_hsp-len ( %INTEGER; )>
 
<!-- 
 Effective search space
 -->
<!ELEMENT Statistics_eff-space ( %REAL; )>
 
<!-- 
 Karlin-Altschul parameter K
 -->
<!ELEMENT Statistics_kappa ( %REAL; )>
 
<!-- 
 Karlin-Altschul parameter Lambda
 -->
<!ELEMENT Statistics_lambda ( %REAL; )>
 
<!-- 
 Karlin-Altschul parameter H
 -->
<!ELEMENT Statistics_enthropy ( %REAL; )>
 
 
<!-- Definition of Hit -->
 
<!ELEMENT Hit ( 
               Hit_num ,
               Hit_id ,
               Hit_def ,
               Hit_accession ,
               Hit_len ,
               Hit_hsps? )>
 
 
<!-- 
 hit number
 -->
<!ELEMENT Hit_num ( %INTEGER; )>
 
<!-- 
 SeqId of subject
 -->
<!ELEMENT Hit_id ( #PCDATA )>
 
<!-- 
 definition line of subject
 -->
<!ELEMENT Hit_def ( #PCDATA )>
 
<!-- 
 accession
 -->
<!ELEMENT Hit_accession ( #PCDATA )>
 
<!-- 
 length of subject
 -->
<!ELEMENT Hit_len ( %INTEGER; )>
 
<!-- 
 all HSP regions for the given subject
 -->
<!ELEMENT Hit_hsps ( Hsp* )>
 
 
 
<!-- Definition of Hsp -->
 
<!ELEMENT Hsp ( 
               Hsp_num ,
               Hsp_score ,
               Hsp_evalue ,
               Hsp_query-from ,
               Hsp_query-to ,
               Hsp_hit-from ,
               Hsp_hit-to ,
               Hsp_pattern-from? ,
               Hsp_pattern-to? ,
               Hsp_query-frame? ,
               Hsp_hit-frame? ,
               Hsp_identity? ,
               Hsp_positive? ,
               Hsp_gaps? ,
               Hsp_density? ,
               Hsp_qseq ,
               Hsp_hseq ,
               Hsp_midline? )>
 
 
<!-- 
 HSP number
 -->
<!ELEMENT Hsp_num ( %INTEGER; )>
 
<!-- 
 score (in bits) of HSP
 -->
<!ELEMENT Hsp_score ( %REAL; )>
 
<!-- 
 e-value of HSP
 -->
<!ELEMENT Hsp_evalue ( %REAL; )>
 
<!-- 
 start of HSP in query
 -->
<!ELEMENT Hsp_query-from ( %INTEGER; )>
 
<!-- 
 end of HSP
 -->
<!ELEMENT Hsp_query-to ( %INTEGER; )>
 
<!-- 
 start of HSP in subject
 -->
<!ELEMENT Hsp_hit-from ( %INTEGER; )>
 
<!-- 
 end of HSP in subject
 -->
<!ELEMENT Hsp_hit-to ( %INTEGER; )>
 
<!-- 
 start of PHI-BLAST pattern
 -->
<!ELEMENT Hsp_pattern-from ( %INTEGER; )>
 
<!-- 
 end of PHI-BLAST pattern
 -->
<!ELEMENT Hsp_pattern-to ( %INTEGER; )>
 
<!-- 
 translation frame of query
 -->
<!ELEMENT Hsp_query-frame ( %INTEGER; )>
 
<!-- 
 translation frame of subject
 -->
<!ELEMENT Hsp_hit-frame ( %INTEGER; )>
 
<!-- 
 number of identities in HSP
 -->
<!ELEMENT Hsp_identity ( %INTEGER; )>
 
<!-- 
 number of positives in HSP
 -->
<!ELEMENT Hsp_positive ( %INTEGER; )>
 
<!-- 
 number of gaps in HSP
 -->
<!ELEMENT Hsp_gaps ( %INTEGER; )>
 
<!-- 
 score density
 -->
<!ELEMENT Hsp_density ( %INTEGER; )>
 
<!-- 
 alignment string for the query (with gaps)
 -->
<!ELEMENT Hsp_qseq ( #PCDATA )>
 
<!-- 
 alignment string for subject (with gaps)
 -->
<!ELEMENT Hsp_hseq ( #PCDATA )>
 
<!-- 
 formating middle line
 -->
<!ELEMENT Hsp_midline ( #PCDATA )>
 
 
 
 

--=_EFB74F0C.4829443B--

--=_EFB74F0C.4A2B4639--