[Biopython] translating 454 data with frameshifts

Fri Dec 10 19:30:44 UTC 2010

Hello Jessica,

I am not a programmer and can't help you with a python equivalent to  
"tile_hsp", but as fas as I can tell, the GeneWise Tool might be  
helpful to you. http://www.ebi.ac.uk/Tools/Wise2/index.html - There is  
a standalone version available for download which can be used for a  
whole batch of sequences. It aligns DNA sequences to Protein queries  
(which you would get by blastp) and also accounts for frameshifts!

Best luck,

Tony

> Message: 5
> Date: Fri, 10 Dec 2010 09:59:38 -0500
> From: Jessica Grant <jgrant at smith.edu>
> Subject: [Biopython] translating 454 data with frameshifts
> To: biopython at biopython.org
> Message-ID: <a06240804c927edd64540@[131.229.113.228]>
> Content-Type: text/plain; charset="us-ascii" ; format="flowed"
>
> We have some transcriptome 454 data and quite simply we are trying to
> build a protein database from the nucleotide sequences.  The problem
> comes in that there are quite a lot of frameshifts in our  contig
> assemblies--and in the original sequences as well.
>
> We have a list of the best blastx hit for each sequence, and I have tried
>
> 1 - blasting each sequence against its best hit
> 2 - taking the hsp_qseqs from the blast output
> 3 - sticking them together, in order,  if there is more than one hsp.
>
>
> This has worked for many of the sequences but sometimes there are
> overlapping "best hsp_qseqs" and when I stick them together I get a
> long made-up protein.  Also, for some sequences, the qseq goes past
> the point where the alignment should stop and then when I stick them
> together I get a few extra amino acids in my protein that ought not
> to be there.
>
> Frank Kauff told me that bioperl has a "tile_hsp" function, but
> before I try understanding how that works in a language I am not
> familiar with, I thought I would ask here to see if anyone knows of a
> way to do this in python.
>
> Is there a smart way to concatenate hsps in biopython?  Does anyone
> have a better idea about how to build a protein database from 454
> data?
>
> Thank you!
>
> Jessica
>