[Biopython-dev] SearchIO HSP indexing

Colin Archer colin.aibn at gmail.com
Sat Feb 9 13:06:13 UTC 2013


Hi everyone,
                  I have a question about the implementation of
high-scoring segment pairs (HSPs) in SearchIO. I currently have an BLAST
output file in XML format I am parsing and this is one of the hits (removed
the alignment details to save space):

        <Hit>
          <Hit_num>1</Hit_num>
          <Hit_id>gnl|BL_ORD_ID|111</Hit_id>
          <Hit_def>ref|NC_007779|:125695-127587</Hit_def>
          <Hit_accession>111</Hit_accession>
          <Hit_len>1893</Hit_len>
          <Hit_hsps>
            <Hsp>
              <Hsp_num>1</Hsp_num>
              <Hsp_bit-score>3352.79</Hsp_bit-score>
              <Hsp_score>1815</Hsp_score>
              <Hsp_evalue>0</Hsp_evalue>
              <Hsp_query-from>1</Hsp_query-from>
              <Hsp_query-to>1893</Hsp_query-to>
              <Hsp_hit-from>1</Hsp_hit-from>
              <Hsp_hit-to>1893</Hsp_hit-to>
              <Hsp_query-frame>1</Hsp_query-frame>
              <Hsp_hit-frame>1</Hsp_hit-frame>
              <Hsp_identity>1867</Hsp_identity>
              <Hsp_positive>1867</Hsp_positive>
              <Hsp_gaps>0</Hsp_gaps>
            </Hsp>
            <Hsp>
              <Hsp_num>2</Hsp_num>
              <Hsp_bit-score>399.997</Hsp_bit-score>
              <Hsp_score>216</Hsp_score>
              <Hsp_evalue>2.88061e-111</Hsp_evalue>
              <Hsp_query-from>331</Hsp_query-from>
              <Hsp_query-to>881</Hsp_query-to>
              <Hsp_hit-from>22</Hsp_hit-from>
              <Hsp_hit-to>581</Hsp_hit-to>
              <Hsp_query-frame>1</Hsp_query-frame>
              <Hsp_hit-frame>1</Hsp_hit-frame>
              <Hsp_identity>452</Hsp_identity>
              <Hsp_positive>452</Hsp_positive>
              <Hsp_gaps>19</Hsp_gaps>
              <Hsp_align-len>565</Hsp_align-len>
            </Hsp>

Using Hsp1 as an example, the query and hit starts ("Hsp_query_to" and
"Hsp_hit_to") are both 1 in the XML but when I access the Hsp objects from
the BlastResult, both values are equal to 0:

>>> blast_record[0][0].query_start
0
>>> blast_record[0][0].hit_start
0

However, when I access the end objects for the query and hit, the result
isn't 1892 (zero based 1893) but 1893:

>>> blast_record[0][0].query_end
1893
>>> blast_record[0][0].hit_end
1893

Is this correct? I find it a little confusing that one result is zero-based
and the other one-based.

Thanks
Colin



More information about the Biopython-dev mailing list