[Bioperl-l] Add a kind of hspsepQmax/hspsepSmax (like WuBlast has)in Bio::Search::Tiling::MapTiling

Mark A. Jensen maj at fortinbras.us
Mon Apr 26 13:17:51 UTC 2010


Hi Fred,

I'll tell you how you can write a kludge; maybe you can expand it into
a more general method.

For your tblastn data, get the coverage map array

 @map = $tiling->coverage_map('hit', 'p0')

Each element of the map is a ref to a pair [$int, $hsp], where $int is
itself a reference to a two-elt array containing the coordinates of the
hsp in context and $hsp is the hsp object itself. You can use these to
filter the @map array.

For your example, you can just get rid of the first @map elt:

 shift @map;

Replace the internal map for this type and context, so that
the methods work on the modified map:

 $tiling->{'coverage_map_hit_p0'} = \@map;

Then $tiling->identities('hit', 'exact', 'p0'), etc. give you the
new values.

HTH-
MAJ
----- Original Message ----- 
From: <Frederic.SAPET at biogemma.com>
To: <bioperl-l at bioperl.org>
Sent: Friday, April 23, 2010 11:16 AM
Subject: [Bioperl-l] Add a kind of hspsepQmax/hspsepSmax (like WuBlast has)in 
Bio::Search::Tiling::MapTiling


> Hello
>
> Based on bp_search2gff.pl script and Bio::Search::Tiling::MapTiling
> documentation (http://www.bioperl.org/wiki/HOWTO:Tiling), I'm trying to
> write a generic blast to gff3 parser.
>
> My idea is to filter hits on frac_aligned and percent_identity values.
>
> I'm facing a problem with a BlastX result and the corresponding TBlastN.
>
> Please find my script and the two example files attached.
>
> The example is a piece of Maize Chromosome where a protein seems to be
> duplicated.
>
> When I launch the parsing of BlastX file and I want to retrieve data from
> a Query View ( >tiling.pl BlastX query), I have :
>
> Chr6:159690000-159718000        BLASTX  match_set       23971   25620
> 121.6   +       .
> ID=Os03g17980.2:1.1.1;alignLength=576;eValue=4.6e-137;fractionAligned=97.0530451866405;gapNumber=16;Name=Os03g17980.2;percentageIdentity=69.1552062868369
> Chr6:159690000-159718000        BLASTX  match_part      23971   24186 331
>  +       0       Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 120 191
> Chr6:159690000-159718000        BLASTX  match_part      24820   24915 100
>  +       0       Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 291 322
> Chr6:159690000-159718000        BLASTX  match_part      25195   25308   89
>     +       0       Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 358 395
> Chr6:159690000-159718000        BLASTX  match_part      25390   25620 192
>  +       0       Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 395 472
>
> Chr6:159690000-159718000        BLASTX  match_set       918     2567 121.6
>  +       .
> ID=Os03g17980.2:1.2.1;alignLength=576;eValue=4.6e-137;fractionAligned=97.0530451866405;gapNumber=16;Name=Os03g17980.2;percentageIdentity=69.1552062868369
> Chr6:159690000-159718000        BLASTX  match_part      918     1148 192
> -       0       Parent=Os03g17980.2:1.2.1;Target=Os03g17980.2 395 472
> Chr6:159690000-159718000        BLASTX  match_part      1230    1343    89
>     -       0       Parent=Os03g17980.2:1.2.1;Target=Os03g17980.2 358 395
> Chr6:159690000-159718000        BLASTX  match_part      1623    1718 100
> -       0       Parent=Os03g17980.2:1.2.1;Target=Os03g17980.2 291 322
> Chr6:159690000-159718000        BLASTX  match_part      2352    2567 331
> -       0       Parent=Os03g17980.2:1.2.1;Target=Os03g17980.2 120 191
>
> this is perfect, I retrieve two nice hits, with perfectly tiled HSP.
>
> But, with the TBlastN report (using a Hit View :  >tiling.pl TBlastN hit),
> I have :
> Chr6:159690000-159718000        TBLASTN match_set       7666    25620
> 121.6   +       .
> ID=Os03g17980.2:1.1.1;alignLength=303;eValue=4.9e-137;fractionAligned=98.8212180746562;gapNumber=18;Name=Os03g17980.2;percentageIdentity=66.0052390307793
> Chr6:159690000-159718000        TBLASTN match_part      7666    7917    44
>     +       0       Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 332 416
> Chr6:159690000-159718000        TBLASTN match_part      23971   24186 331
>  +       0       Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 120 191
> Chr6:159690000-159718000        TBLASTN match_part      24820   24915 100
>  +       0       Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 291 322
> Chr6:159690000-159718000        TBLASTN match_part      25195   25308   89
>     +       0       Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 358 395
> Chr6:159690000-159718000        TBLASTN match_part      25390   25620 192
>  +       0       Parent=Os03g17980.2:1.1.1;Target=Os03g17980.2 395 472
>
> I lose one of my hit, because another HSP is tiled to my hit, so I trash
> it when I filter the context using identitie values (line 42 to 54 of my
> script).
> This HSP is far away in 5', so I would like to know if it could be
> possible to add (or help me to develop this) a sort of
> hspsepQmax/hspsepSmax (maximum allowed separation along the query(or
> subject) sequence between two HSPs ) as a new parameter during the tiling
> phase ?
>
>
>
> Thank you.
>
> Fred
>
>
>


--------------------------------------------------------------------------------


> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l 




More information about the Bioperl-l mailing list