BLAST DTD (was RE: [Biojava-l] SeqSimilaritySearchSubHit - Strand information)

Bobick, Stephen Stephen_Bobick at rosettabio.com
Tue Dec 2 16:43:55 EST 2003


Greetings,

I'm afraid I will not be answering the poster here, but the message caught
my curiousity and prompted me to take a peek at the BLAST DTD, and
subsequently post this commentary.  My question is how was the BLAST DTD
designed and under what standards?  I find the choice of element names to be
unfortunate.  In comparing to standard XML naming and DTD design I would
expect something like:

  <hsp_query from="576" to="229" frame="1"/>

Rather than the following:

  <Hsp_query-from>576</Hsp_query-from>
  <Hsp_query-to>229</Hsp_query-to>
  <Hsp_query-frame>1</Hsp_query-frame>

The two primary differences are in capitalization, and the choice attributes
rather than separate elements for each datum in this excerpt.  As a
consequence, the "expected" form is more succinct.  From the DTD I see the
latter naming and element/attribute choice is repeated many times.

I will add an admission that I have not worked with BLAST results in several
years, as my focus has been on data management software (LIMS) and, more
recently, analysis software.  Still, as a professional in the greater
bioinformatics community, who works daily with XML, I do like to see an
incorporation of good practices from the "pure" software development
community.

Comments?

Stephen Bobick  


-----Original Message-----
From: biojava-l-bounces at portal.open-bio.org
[mailto:biojava-l-bounces at portal.open-bio.org] On Behalf Of Jan Würthner
Sent: Tuesday, December 02, 2003 12:52 AM
To: biojava-l at biojava.org
Subject: [Biojava-l] SeqSimilaritySearchSubHit - Strand information



Hi folks,

I'm constructing SeqSimilaritySearchSubHit instances from xml formatted NCBI

BLAST results, and I'm getting steadily confused with the query's and 
subject's from and to information on one hand and the query's and subject's 
strand on the other hand.

The NCBI returns for example:

       <Hsp_query-from>576</Hsp_query-from>
       <Hsp_query-to>229</Hsp_query-to>
       <Hsp_query-frame>1</Hsp_query-frame>

       <Hsp_hit-from>12374053</Hsp_hit-from>
       <Hsp_hit-to>12374401</Hsp_hit-to>
       <Hsp_hit-frame> -1</Hsp_hit-frame>

I'd think that the possibility to assign the from- and to-values in
different 
orders (like descending in this query) already includes the information
about 
the direction (POSITIVE/NEGATIVE). Why is there an additional "frame" value,

and why is the query's frame value set to +1, and the subject's (=hit's) 
value set to -1? I assumed it to be assigned vice versa.

My question is: How shall I set the SeqSimilaritySearchSubHit instance's 
query/subject values from these data?

Having answered this will be of much help!

Thank you
Jan

-- 
Jan Würthner
Institute for Medical Microbiology
Building 22.21
Heinrich-Heine-University
Universitätsstraße 1
40225 Duesseldorf

Tel. +49 (0) 211 81 12461
URL: www.medmikro.uni-duesseldorf.de


_______________________________________________
Biojava-l mailing list  -  Biojava-l at biojava.org
http://biojava.org/mailman/listinfo/biojava-l




More information about the Biojava-l mailing list