[Bioperl-l] Polyproteins, ribo slippage, and mat_peptide in viruses?

Tue Oct 27 16:33:01 UTC 2009

All,

I am attempting to find some solutions to a DB loading problem we are  
encountering in viruses. It is multifold:

Some viruses churn out a polyprotein rather than individual peptides;  
further they also slip the ribosome, so a source nucleotide is used  
more than once  in translation (ribosome halts, backs up one  
nucleotide, and continues in a new frame); and finally we have post  
translational processing into mature peptides. The main thing is that  
the mature peptide is contained a a subset of the whole parent  
polyprotein, but is not provided as a single file in GBK for each  
mat_peptide CDS. We have to get that in order to run algorithms on the  
relevant processed proteins. Therefore we cannot directly load into  
GUS, but rather have to choose how to get the mat_peptide sequence.  
Actually I think the viruses know that, and are just messing with us  
out of spite, since we have iPods and they dont. Anyway.. from anyone  
who has encountered this I seek guidance.

We have as choices:

1. Get the locations of mature peptide children in /Protein/
carve the mat_peptide sequence out of the whole polyprotein translation
check that the mat_peptide is infact an identical subset of the  
translated protein
load that

OR

2. Use the locations of starts and stops in /Nucleotide/
translate that, using the slippage information
get mature peptides that line up exactly to the parent polyprotein

If you know of BioPerl sequence handling support for this, I would  
love to hear more. Clearly this is a nonstandard thingamabob.

Stupid viruses

Chris

-- 

Christopher Larsen, Ph.D.
Sr. Scientist / Grants Manager
Vecna Technologies
6404 Ivy Lane #500
Greenbelt, MD 20770
Phone: (240) 965-4525
Fax: (240) 547-6133
240-737-4525