BLAST DTD (was RE: [Biojava-l] SeqSimilaritySearchSubHit - Strand information)

Michael E. Smoot mes5k at cs.virginia.edu
Tue Dec 2 17:37:02 EST 2003


This page explains how the DTD's were created:

	http://www.ncbi.nlm.nih.gov/IEB/ToolBox/XML/ncbixml.txt

The short version is that the DTD's are transliterations of their ASN.1
data models.


Mike


On Tue, 2 Dec 2003, Bobick, Stephen wrote:

>
> Greetings,
>
> I'm afraid I will not be answering the poster here, but the message caught
> my curiousity and prompted me to take a peek at the BLAST DTD, and
> subsequently post this commentary.  My question is how was the BLAST DTD
> designed and under what standards?  I find the choice of element names to be
> unfortunate.  In comparing to standard XML naming and DTD design I would
> expect something like:
>
>   <hsp_query from="576" to="229" frame="1"/>
>
> Rather than the following:
>
>   <Hsp_query-from>576</Hsp_query-from>
>   <Hsp_query-to>229</Hsp_query-to>
>   <Hsp_query-frame>1</Hsp_query-frame>
>
> The two primary differences are in capitalization, and the choice attributes
> rather than separate elements for each datum in this excerpt.  As a
> consequence, the "expected" form is more succinct.  From the DTD I see the
> latter naming and element/attribute choice is repeated many times.
>
> I will add an admission that I have not worked with BLAST results in several
> years, as my focus has been on data management software (LIMS) and, more
> recently, analysis software.  Still, as a professional in the greater
> bioinformatics community, who works daily with XML, I do like to see an
> incorporation of good practices from the "pure" software development
> community.
>
> Comments?
>
> Stephen Bobick
>
>
> -----Original Message-----
> From: biojava-l-bounces at portal.open-bio.org
> [mailto:biojava-l-bounces at portal.open-bio.org] On Behalf Of Jan Würthner
> Sent: Tuesday, December 02, 2003 12:52 AM
> To: biojava-l at biojava.org
> Subject: [Biojava-l] SeqSimilaritySearchSubHit - Strand information
>
>
>
> Hi folks,
>
> I'm constructing SeqSimilaritySearchSubHit instances from xml formatted NCBI
>
> BLAST results, and I'm getting steadily confused with the query's and
> subject's from and to information on one hand and the query's and subject's
> strand on the other hand.
>
> The NCBI returns for example:
>
>        <Hsp_query-from>576</Hsp_query-from>
>        <Hsp_query-to>229</Hsp_query-to>
>        <Hsp_query-frame>1</Hsp_query-frame>
>
>        <Hsp_hit-from>12374053</Hsp_hit-from>
>        <Hsp_hit-to>12374401</Hsp_hit-to>
>        <Hsp_hit-frame> -1</Hsp_hit-frame>
>
> I'd think that the possibility to assign the from- and to-values in
> different
> orders (like descending in this query) already includes the information
> about
> the direction (POSITIVE/NEGATIVE). Why is there an additional "frame" value,
>
> and why is the query's frame value set to +1, and the subject's (=hit's)
> value set to -1? I assumed it to be assigned vice versa.
>
> My question is: How shall I set the SeqSimilaritySearchSubHit instance's
> query/subject values from these data?
>
> Having answered this will be of much help!
>
> Thank you
> Jan
>
> --
> Jan Würthner
> Institute for Medical Microbiology
> Building 22.21
> Heinrich-Heine-University
> Universitätsstraße 1
> 40225 Duesseldorf
>
> Tel. +49 (0) 211 81 12461
> URL: www.medmikro.uni-duesseldorf.de
>
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>



More information about the Biojava-l mailing list