[Biopython-dev] SearchIO HSP indexing

Colin Archer colin.aibn at gmail.com
Sat Feb 9 13:54:42 UTC 2013


Hi Peter,
             Thanks for getting back to me so quickly.

I'm curious about the benefits of having these values in Python string
slicing format? I haven't come across this very often, I'm used to seeing
values systematically zero or one-based.

Would it be easier to keep the range variables hit_range and hit_range_all
in slicing format and the start and end variables in sequence position
format so that they represent the actual BLAST results?

I had a look at some of the code and I can't see the slicing format
mentioned anywhere (Hsp.py, Hit.py, or blast_xml.py). It would probably be
helpful to explain the values in Hsp.py as a ** mark on hsp_start, hsp_end,
query_start, and query_end so that if people are interested they can have a
look at the files and see what they mean.

Thanks
Colin


On Sat, Feb 9, 2013 at 11:16 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:

> On Sat, Feb 9, 2013 at 1:06 PM, Colin Archer <colin.aibn at gmail.com> wrote:
> > Hi everyone,
> >                   I have a question about the implementation of
> > high-scoring segment pairs (HSPs) in SearchIO. I currently have an BLAST
> > output file in XML format I am parsing and this is one of the hits
> (removed
> > the alignment details to save space):
> >
> >         <Hit>
> >           <Hit_num>1</Hit_num>
> >           <Hit_id>gnl|BL_ORD_ID|111</Hit_id>
> >           <Hit_def>ref|NC_007779|:125695-127587</Hit_def>
> >           <Hit_accession>111</Hit_accession>
> >           <Hit_len>1893</Hit_len>
> >           <Hit_hsps>
> >             <Hsp>
> >               <Hsp_num>1</Hsp_num>
> >               <Hsp_bit-score>3352.79</Hsp_bit-score>
> >               <Hsp_score>1815</Hsp_score>
> >               <Hsp_evalue>0</Hsp_evalue>
> >               <Hsp_query-from>1</Hsp_query-from>
> >               <Hsp_query-to>1893</Hsp_query-to>
> >               <Hsp_hit-from>1</Hsp_hit-from>
> >               <Hsp_hit-to>1893</Hsp_hit-to>
> >               <Hsp_query-frame>1</Hsp_query-frame>
> >               <Hsp_hit-frame>1</Hsp_hit-frame>
> >               <Hsp_identity>1867</Hsp_identity>
> >               <Hsp_positive>1867</Hsp_positive>
> >               <Hsp_gaps>0</Hsp_gaps>
> >             </Hsp>
> >             <Hsp>
> >               <Hsp_num>2</Hsp_num>
> >               <Hsp_bit-score>399.997</Hsp_bit-score>
> >               <Hsp_score>216</Hsp_score>
> >               <Hsp_evalue>2.88061e-111</Hsp_evalue>
> >               <Hsp_query-from>331</Hsp_query-from>
> >               <Hsp_query-to>881</Hsp_query-to>
> >               <Hsp_hit-from>22</Hsp_hit-from>
> >               <Hsp_hit-to>581</Hsp_hit-to>
> >               <Hsp_query-frame>1</Hsp_query-frame>
> >               <Hsp_hit-frame>1</Hsp_hit-frame>
> >               <Hsp_identity>452</Hsp_identity>
> >               <Hsp_positive>452</Hsp_positive>
> >               <Hsp_gaps>19</Hsp_gaps>
> >               <Hsp_align-len>565</Hsp_align-len>
> >             </Hsp>
> >
> > Using Hsp1 as an example, the query and hit starts ("Hsp_query_to" and
> > "Hsp_hit_to") are both 1 in the XML but when I access the Hsp objects
> from
> > the BlastResult, both values are equal to 0:
> >
> >>>> blast_record[0][0].query_start
> > 0
> >>>> blast_record[0][0].hit_start
> > 0
> >
> > However, when I access the end objects for the query and hit, the result
> > isn't 1892 (zero based 1893) but 1893:
> >
> >>>> blast_record[0][0].query_end
> > 1893
> >>>> blast_record[0][0].hit_end
> > 1893
> >
> > Is this correct? I find it a little confusing that one result is
> zero-based
> > and the other one-based.
> >
> > Thanks
> > Colin
>
> Hi Colin,
>
> The SearchIO positions like elsewhere in Biopython should be
> using Python style counting. Looking at this one:
>
>                <Hsp_hit-from>1</Hsp_hit-from>
>                <Hsp_hit-to>1893</Hsp_hit-to>
>
> That is like a GenBank/EMBL location 1..1893 which in Python string
> slicing is [0:1893], so the start has -1 but the end is unchanged. The
> nice thing is the length is 1893 and is given as the difference of the
> Python slicing style end and start.
>
> Perhaps we need to work on the help text? Any suggestions?
>
> Thanks,
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>



More information about the Biopython-dev mailing list