[Bioperl-l] Add a kind of hspsepQmax/hspsepSmax (like WuBlast has) in Bio::Search::Tiling::MapTiling

Frederic.SAPET at biogemma.com Frederic.SAPET at biogemma.com
Fri Apr 23 15:16:55 UTC 2010


Hello

Based on bp_search2gff.pl script and Bio::Search::Tiling::MapTiling 
documentation (http://www.bioperl.org/wiki/HOWTO:Tiling), I'm trying to 
write a generic blast to gff3 parser.

My idea is to filter hits on frac_aligned and percent_identity values.

I'm facing a problem with a BlastX result and the corresponding TBlastN.

Please find my script and the two example files attached.

The example is a piece of Maize Chromosome where a protein seems to be 
duplicated.

When I launch the parsing of BlastX file and I want to retrieve data from 
a Query View ( >tiling.pl BlastX query), I have :

Chr6:159690000-159718000        BLASTX  match_set       23971   25620 
121.6   +       . 
ID=Os03g17980.2:1.1.1;alignLength=576;eValue=4.6e-137;fractionAligned=97.0530451866405;gapNumber=16;Name=Os03g17980.2;percentageIdentity=69.1552062868369
Chr6:159690000-159718000        BLASTX  match_part      23971   24186 331  
  +       0       Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 120 191
Chr6:159690000-159718000        BLASTX  match_part      24820   24915 100  
  +       0       Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 291 322
Chr6:159690000-159718000        BLASTX  match_part      25195   25308   89 
     +       0       Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 358 395
Chr6:159690000-159718000        BLASTX  match_part      25390   25620 192  
  +       0       Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 395 472

Chr6:159690000-159718000        BLASTX  match_set       918     2567 121.6 
  +       . 
ID=Os03g17980.2:1.2.1;alignLength=576;eValue=4.6e-137;fractionAligned=97.0530451866405;gapNumber=16;Name=Os03g17980.2;percentageIdentity=69.1552062868369
Chr6:159690000-159718000        BLASTX  match_part      918     1148 192  
-       0       Parent=Os03g17980.2:1.2.1;Target=Os03g17980.2 395 472
Chr6:159690000-159718000        BLASTX  match_part      1230    1343    89 
     -       0       Parent=Os03g17980.2:1.2.1;Target=Os03g17980.2 358 395
Chr6:159690000-159718000        BLASTX  match_part      1623    1718 100  
-       0       Parent=Os03g17980.2:1.2.1;Target=Os03g17980.2 291 322
Chr6:159690000-159718000        BLASTX  match_part      2352    2567 331  
-       0       Parent=Os03g17980.2:1.2.1;Target=Os03g17980.2 120 191

this is perfect, I retrieve two nice hits, with perfectly tiled HSP.

But, with the TBlastN report (using a Hit View :  >tiling.pl TBlastN hit), 
I have :
Chr6:159690000-159718000        TBLASTN match_set       7666    25620 
121.6   +       . 
ID=Os03g17980.2:1.1.1;alignLength=303;eValue=4.9e-137;fractionAligned=98.8212180746562;gapNumber=18;Name=Os03g17980.2;percentageIdentity=66.0052390307793
Chr6:159690000-159718000        TBLASTN match_part      7666    7917    44 
     +       0       Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 332 416
Chr6:159690000-159718000        TBLASTN match_part      23971   24186 331  
  +       0       Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 120 191
Chr6:159690000-159718000        TBLASTN match_part      24820   24915 100  
  +       0       Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 291 322
Chr6:159690000-159718000        TBLASTN match_part      25195   25308   89 
     +       0       Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 358 395
Chr6:159690000-159718000        TBLASTN match_part      25390   25620 192  
  +       0       Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 395 472

I lose one of my hit, because another HSP is tiled to my hit, so I trash 
it when I filter the context using identitie values (line 42 to 54 of my 
script).
This HSP is far away in 5', so I would like to know if it could be 
possible to add (or help me to develop this) a sort of 
hspsepQmax/hspsepSmax (maximum allowed separation along the query(or 
subject) sequence between two HSPs ) as a new parameter during the tiling 
phase ?



Thank you.

Fred


-------------- next part --------------
A non-text attachment was scrubbed...
Name: BlastX
Type: application/octet-stream
Size: 12781 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100423/02c50e8e/attachment-0012.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: TBlastN
Type: application/octet-stream
Size: 11308 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100423/02c50e8e/attachment-0013.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tiling.pl
Type: application/octet-stream
Size: 6152 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100423/02c50e8e/attachment-0014.obj>


More information about the Bioperl-l mailing list