[Bioperl-l] Polyproteins, ribo slippage, and mat_peptide in viruses?
Chris Larsen
clarsen at vecna.com
Thu Nov 12 17:22:26 UTC 2009
All,
This is a short followup on the prior thread of discussion, regarding
computing mature peptide sequences for viruses. The topic has gone
underwater for the time being as we solve some problems with source
data. While the biopython effort and contributors on this board have
given good guidance, and we now have scripts that function (thanks
mostly to pcock), however, the source data on which everything relies
is suspect:
mat_peptide 15118..16914 <===
/product="nsp13"
/note="helicase"
I can tell you the virus community does not want to rely heavily, on
those position numbers. Furthermore we have found fewer compete source
genomes for viruses than bacteria, more virus-to-virus variation in
the data fields annotated in the GBK file, (Gene, CDS, ORF, Protein,
Polyprotein, mat_peptide, db_xref) and in fact the community will have
to come together significantly on how these molecules are defined in
public repositories, before a mature scripting effort becomes
reliable, public and well received. Because of the variation in
viruses, it's not even clear at this point what a 'gene' is. I will
let you know how we proceed when more sequence data has been fully
analyzed, and we can think about making any perl based solution a new
viral protein module.
Thanks,
Chris
--
Christopher Larsen, Ph.D.
Sr. Scientist / Grants Manager
Vecna Technologies
6404 Ivy Lane #500
Greenbelt, MD 20770
Phone: (240) 965-4525
Fax: (240) 547-6133
240-737-4525
More information about the Bioperl-l
mailing list