BLAST DTD (was RE: [Biojava-l] SeqSimilaritySearchSubHit -Strand information)

Jan Würthner jan.wuerthner at uni-duesseldorf.de
Wed Dec 3 08:36:38 EST 2003


Hi folks,

I now received the answer from blast-help at ncbi.nlm.nih.gov (see below).

For my purposes, I conclude that don't need the "frame" value, especially 
since I use "blastn" as a program. It seems save to construct the 
(SeqSimilaritySearchSubHit's) query- and subject-strand values from the way 
the from- and to-values are ordered (ascending or descending).

Jan

answer from blast-help:
-------8<----------------------------------------------

In our blast result, "Frame" refers to the translation orientation and frame
since there are 6 possible ones with three from each strand.  Their assigned
value are +1, +2, +3, -1, -2, and -3.  This is only relevant if query/db
translation is involved (blastx, tblastn, tblastx).

Since blast only reports local alignments, one may see multiple Frame with
the same value mentioned, which may or may not cover the same area of the
query or subject.

One may be able to derive this using additional calculation from the from
and to field along with the sequence length.  However, BLAST calculates this
out and presents it in a more straight forward manner.  It is up to the user
on whether to use it or not.
-------------------------------->8---------------------

Am Tuesday 02 December 2003 23:57 schrieb Bobick, Stephen:
> Interesting read.  There are two sections worthy of comment:
>   >NCBI is not proposing a new data model, but is simply transliterating
>   >the data model we have used for the last decade into a different
>   > language
>
> for the
>
>   >convenience of our users. ASN.1 has a number of specific data types such
>
> as INTEGER
>
>   >or REAL numbers while XML has only strings, so our DTD automatically
>   > adds
>
> some
>
>   >ENTITY definitions at the top which maps these numbers to strings. This
>
> mapping only
>
>   >allows humans that read the DTD to see where numbers are expected; an
>   > XML
>
> validator
>
>   >will not care what is there.
>
> Use of an XML Schema would allow the enforcement of data types.
>
>   >Summary:
>   >While the effect of Roles, Scope, and Alternate Forms results in
>
> extensive
>
>   >tags in the XML, it does accurately reflect the structure and use of the
>
> data. It allows
>
>   >XML programs to capture as little or as much of the full data structure
>
> as they wish.
>
> I guess I fail to see the point of all this.  How would a structure
> resulting from the suggestions that I propose be "lossy" in any way?
>
> Stephen Bobick
>
>
> -----Original Message-----
> From: Michael E. Smoot [mailto:mes5k at cs.virginia.edu]
> Sent: Tuesday, December 02, 2003 2:37 PM
> To: Bobick, Stephen
> Cc: biojava-l at biojava.org
> Subject: Re: BLAST DTD (was RE: [Biojava-l] SeqSimilaritySearchSubHit -
> Strand information)
>
>
>
> This page explains how the DTD's were created:
>
> 	http://www.ncbi.nlm.nih.gov/IEB/ToolBox/XML/ncbixml.txt
>
> The short version is that the DTD's are transliterations of their ASN.1
> data models.
>
>
> Mike
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at biojava.org
> http://biojava.org/mailman/listinfo/biojava-l

-- 
Jan Würthner
Institute for Medical Microbiology
Building 22.21
Heinrich-Heine-University
Universitätsstraße 1
40225 Duesseldorf

Tel. +49 (0) 211 81 12461
URL: www.medmikro.uni-duesseldorf.de




More information about the Biojava-l mailing list