[Bioperl-l] Add a kind of hspsepQmax/hspsepSmax (like WuBlast has) in Bio::Search::Tiling::MapTiling
Frederic.SAPET at biogemma.com
Frederic.SAPET at biogemma.com
Fri Apr 23 15:16:55 UTC 2010
Hello
Based on bp_search2gff.pl script and Bio::Search::Tiling::MapTiling
documentation (http://www.bioperl.org/wiki/HOWTO:Tiling), I'm trying to
write a generic blast to gff3 parser.
My idea is to filter hits on frac_aligned and percent_identity values.
I'm facing a problem with a BlastX result and the corresponding TBlastN.
Please find my script and the two example files attached.
The example is a piece of Maize Chromosome where a protein seems to be
duplicated.
When I launch the parsing of BlastX file and I want to retrieve data from
a Query View ( >tiling.pl BlastX query), I have :
Chr6:159690000-159718000 BLASTX match_set 23971 25620
121.6 + .
ID=Os03g17980.2:1.1.1;alignLength=576;eValue=4.6e-137;fractionAligned=97.0530451866405;gapNumber=16;Name=Os03g17980.2;percentageIdentity=69.1552062868369
Chr6:159690000-159718000 BLASTX match_part 23971 24186 331
+ 0 Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 120 191
Chr6:159690000-159718000 BLASTX match_part 24820 24915 100
+ 0 Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 291 322
Chr6:159690000-159718000 BLASTX match_part 25195 25308 89
+ 0 Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 358 395
Chr6:159690000-159718000 BLASTX match_part 25390 25620 192
+ 0 Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 395 472
Chr6:159690000-159718000 BLASTX match_set 918 2567 121.6
+ .
ID=Os03g17980.2:1.2.1;alignLength=576;eValue=4.6e-137;fractionAligned=97.0530451866405;gapNumber=16;Name=Os03g17980.2;percentageIdentity=69.1552062868369
Chr6:159690000-159718000 BLASTX match_part 918 1148 192
- 0 Parent=Os03g17980.2:1.2.1;Target=Os03g17980.2 395 472
Chr6:159690000-159718000 BLASTX match_part 1230 1343 89
- 0 Parent=Os03g17980.2:1.2.1;Target=Os03g17980.2 358 395
Chr6:159690000-159718000 BLASTX match_part 1623 1718 100
- 0 Parent=Os03g17980.2:1.2.1;Target=Os03g17980.2 291 322
Chr6:159690000-159718000 BLASTX match_part 2352 2567 331
- 0 Parent=Os03g17980.2:1.2.1;Target=Os03g17980.2 120 191
this is perfect, I retrieve two nice hits, with perfectly tiled HSP.
But, with the TBlastN report (using a Hit View : >tiling.pl TBlastN hit),
I have :
Chr6:159690000-159718000 TBLASTN match_set 7666 25620
121.6 + .
ID=Os03g17980.2:1.1.1;alignLength=303;eValue=4.9e-137;fractionAligned=98.8212180746562;gapNumber=18;Name=Os03g17980.2;percentageIdentity=66.0052390307793
Chr6:159690000-159718000 TBLASTN match_part 7666 7917 44
+ 0 Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 332 416
Chr6:159690000-159718000 TBLASTN match_part 23971 24186 331
+ 0 Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 120 191
Chr6:159690000-159718000 TBLASTN match_part 24820 24915 100
+ 0 Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 291 322
Chr6:159690000-159718000 TBLASTN match_part 25195 25308 89
+ 0 Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 358 395
Chr6:159690000-159718000 TBLASTN match_part 25390 25620 192
+ 0 Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 395 472
I lose one of my hit, because another HSP is tiled to my hit, so I trash
it when I filter the context using identitie values (line 42 to 54 of my
script).
This HSP is far away in 5', so I would like to know if it could be
possible to add (or help me to develop this) a sort of
hspsepQmax/hspsepSmax (maximum allowed separation along the query(or
subject) sequence between two HSPs ) as a new parameter during the tiling
phase ?
Thank you.
Fred
-------------- next part --------------
A non-text attachment was scrubbed...
Name: BlastX
Type: application/octet-stream
Size: 12781 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100423/02c50e8e/attachment-0012.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: TBlastN
Type: application/octet-stream
Size: 11308 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100423/02c50e8e/attachment-0013.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tiling.pl
Type: application/octet-stream
Size: 6152 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100423/02c50e8e/attachment-0014.obj>
More information about the Bioperl-l
mailing list