[Bioperl-l] Finding mutations in Blast files

Chris Fields cjfields at illinois.edu
Wed Jul 21 22:04:48 UTC 2010


On Jul 20, 2010, at 1:36 PM, Gabriel Ab wrote:

> 
> I'm trying to write a script to find mutations in a Blast output file.  
> The result should look something like [A/G] (query/subject) for SNPs and
> [--/AT] for In/Dels.  I also need 50 bases both upstream and downstream if
> possible. 

In order to get upstream and downstream you will need access to the original sequence (this information is not reported in the BLAST file).  Use the HSP coords to pull that information.

> I was able to get both the query and subject sequence extracted from blast
> files, but parsing through that for mutations without the help of modules
> and writing them like above is harder than it sounds.  Deletions and other
> things in the alignment can make it very tricky.  I tried looking for
> modules that would make this task easier but I couldn't find anything.  

...
> Can someone point me in the right direction on how to do this or let me know
> about modules that could help?
> 
> Thanks

Bio::Search::HSP::GenericHSP::seq_inds() does something along these lines, e.g. iterates through the two strings by index position and stores relevant information.  In this case, you would just print out [X/Y] where there is no match, or [--/XY] when gaps are present.  Look at the code for that method to get the general idea.

chris





More information about the Bioperl-l mailing list