BLAST DTD (was RE: [Biojava-l] SeqSimilaritySearchSubHit -Strand information)

Schreiber, Mark mark.schreiber at agresearch.co.nz
Tue Dec 2 17:09:49 EST 2003


Hi -

I suspect you hit the nail on the head when you asked how it was designed and under what standards. My guess at the answers would be, it wasn't and none, respectively. It's pretty funny if you put the DTD into a tool that automatically makes JAXB style bindings. You end up with millions of objects each of which contain a single piece of data. It would have been better to do it the way you suggested.

For a long time the DTD didn't actually validate what was being produced either so I guess we should be glad it actually works now.

Anhow, that's enough ranting from me.

- Mark


> -----Original Message-----
> From: Bobick, Stephen [mailto:Stephen_Bobick at rosettabio.com] 
> Sent: Wednesday, 3 December 2003 10:44 a.m.
> To: biojava-l at biojava.org
> Subject: BLAST DTD (was RE: [Biojava-l] 
> SeqSimilaritySearchSubHit -Strand information)
> 
> 
> 
> Greetings,
> 
> I'm afraid I will not be answering the poster here, but the 
> message caught my curiousity and prompted me to take a peek 
> at the BLAST DTD, and subsequently post this commentary.  My 
> question is how was the BLAST DTD designed and under what 
> standards?  I find the choice of element names to be 
> unfortunate.  In comparing to standard XML naming and DTD 
> design I would expect something like:
> 
>   <hsp_query from="576" to="229" frame="1"/>
> 
> Rather than the following:
> 
>   <Hsp_query-from>576</Hsp_query-from>
>   <Hsp_query-to>229</Hsp_query-to>
>   <Hsp_query-frame>1</Hsp_query-frame>
> 
> The two primary differences are in capitalization, and the 
> choice attributes rather than separate elements for each 
> datum in this excerpt.  As a consequence, the "expected" form 
> is more succinct.  From the DTD I see the latter naming and 
> element/attribute choice is repeated many times.
> 
> I will add an admission that I have not worked with BLAST 
> results in several years, as my focus has been on data 
> management software (LIMS) and, more recently, analysis 
> software.  Still, as a professional in the greater 
> bioinformatics community, who works daily with XML, I do like 
> to see an incorporation of good practices from the "pure" 
> software development community.
> 
> Comments?
> 
> Stephen Bobick  
> 
> 
> -----Original Message-----
> From: biojava-l-bounces at portal.open-bio.org
> [mailto:biojava-l-bounces at portal.open-bio.org] On Behalf Of 
> Jan Würthner
> Sent: Tuesday, December 02, 2003 12:52 AM
> To: biojava-l at biojava.org
> Subject: [Biojava-l] SeqSimilaritySearchSubHit - Strand information
> 
> 
> 
> Hi folks,
> 
> I'm constructing SeqSimilaritySearchSubHit instances from xml 
> formatted NCBI
> 
> BLAST results, and I'm getting steadily confused with the query's and 
> subject's from and to information on one hand and the query's 
> and subject's 
> strand on the other hand.
> 
> The NCBI returns for example:
> 
>        <Hsp_query-from>576</Hsp_query-from>
>        <Hsp_query-to>229</Hsp_query-to>
>        <Hsp_query-frame>1</Hsp_query-frame>
> 
>        <Hsp_hit-from>12374053</Hsp_hit-from>
>        <Hsp_hit-to>12374401</Hsp_hit-to>
>        <Hsp_hit-frame> -1</Hsp_hit-frame>
> 
> I'd think that the possibility to assign the from- and 
> to-values in different 
> orders (like descending in this query) already includes the 
> information about 
> the direction (POSITIVE/NEGATIVE). Why is there an additional 
> "frame" value,
> 
> and why is the query's frame value set to +1, and the 
> subject's (=hit's) 
> value set to -1? I assumed it to be assigned vice versa.
> 
> My question is: How shall I set the SeqSimilaritySearchSubHit 
> instance's 
> query/subject values from these data?
> 
> Having answered this will be of much help!
> 
> Thank you
> Jan
> 
> -- 
> Jan Würthner
> Institute for Medical Microbiology
> Building 22.21
> Heinrich-Heine-University
> Universitätsstraße 1
> 40225 Duesseldorf
> 
> Tel. +49 (0) 211 81 12461
> URL: www.medmikro.uni-duesseldorf.de
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at biojava.org 
> http://biojava.org/mailman/listinfo/biojava-l
> 
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at biojava.org 
> http://biojava.org/mailman/listinfo/biojava-l
> 
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================



More information about the Biojava-l mailing list