[Bioperl-l] Bio::Search::HSP::GenericHSP::seq_inds

Thu Mar 27 15:05:59 UTC 2008

According to the GenericHSP::seq_inds() POD, seq_inds() reports  
residue positions (indices) for the query/subject based on identity/ 
conservation, i.e. these are fro the original sequence positions as  
determined by the HSP data, not alignment column positions.  'gaps'  
should be reported at the position prior to where a gap is inserted.   
However I think something is getting borked when the gap length is  
longer than one, so I would partially qualify this as a bug.

Example: When I ran this using bioperl-live it gives a different set  
of gaps indices which appear to be correct.  I reran the BLASTP using  
the web form using your query against swissprot and parsed it.  I got  
slightly different results for the BLAST report (probably differences  
in the query sequence):

 >gi|74746888|sp|Q5VT52|K0460_HUMAN Uncharacterized protein KIAA0460
Length=1461

  Score = 35.8 bits (81),  Expect = 0.47, Method: Composition-based  
stats.
  Identities = 22/55 (40%), Positives = 32/55 (58%), Gaps = 3/55 (5%)

Query  394  IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL  445
             +A+ E   TT K +KQ ++  NK  NK  KK  T+  P+AA+ + I AE  +Q L
Sbjct  139  VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL  193

.....

seq_inds('query' => 'gaps') reports 409,423, and 424, which is  
partially correct, e.g. there is a gap inserted after position 409 and  
423 in the query.  However, no gap is present after 424; I think this  
occurs b/c the gap length is 2.  The other HSPs report similar problems.

chris

P.S. Just saw than Sendu posted; I agree, seq. positions with gap  
lengths > 1 should be repeated.  Should be easy to fix that.

On Mar 27, 2008, at 8:26 AM, Marc Logghe wrote:

> Hi all,
>
> I am a little bit confused about the above mentioned seq_inds()  
> method.
> At first, I had the impression that the method returns an array of
> positions in the hsp (hit or query) sequence.
>
> At least that is what one would expect looking at the example usage in
> the HOWTOs (http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
> second code block).
>
> Am I correct in believing you can only do this if your hsp query  
> stretch
> starts at position 1 of the query sequence?
>
> I think seq_inds() returns a list of positions relative to the query/ 
> hit
> sequence. So, the code shown in the HOWTO is a kind of special case.
>
> However, I do not understand how seq_inds() is dealing with gaps.
>
> An example. If you blast the worm protein ZK822.4 against swissprot
> using blastp at ncbi you get this hsp as top:
>
>
>
>> sp|Q5VT52|K0460_HUMAN Uncharacterized protein KIAA0460
> Length=1461
>
> Score = 35.8 bits (81),  Expect = 0.48, Method: Composition-based
> stats.
> Identities = 22/55 (40%), Positives = 32/55 (58%), Gaps = 3/55 (5%)
>
> Query  402  IAVEEETKTTKKNKKQ-QQQANKNKNKNKKK--TTIAPEAAIDANIAAEVHTQVL   
> 453
>            +A+ E   TT K +KQ ++  NK  NK  KK  T+  P+AA+ + I AE  +Q L
> Sbjct  139  VALREALSTTFKTQKQLKENLNKQPNKQWKKSQTSTNPKAALKSKIVAEFRSQAL   
> 193
>
>
>
> Now, if you call seq_inds(query => 'gap') on that particular hsp  
> object,
> you get these positions: 417, 431, 432. Obviously, there is no gap in
> the original query sequence at these positions.
> How do you have to read these numbers ? Remark also that for instance
> 417 is the res just in front of the gap.
>
> Regards,
>
> Marc
>
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign