[Bioperl-l] Gene critical region analysis -- visual display

Robert Bradbury robert.bradbury at gmail.com
Fri Dec 4 18:27:38 UTC 2009


Background:
I have been involved in aging research off and on for ~16 years.  My initial
focus was in the eventual decline of the "program" (because DNA has no ECC
and only limited redundancy) therefore my initial work (in the early 1990's
was focused on DNA repair genes (of which there about 150 in the human
genome) [1,2].  Most recently I have focused in on the DNA double strand
break repair processes (NHEJ) as a fundamental cause of aging because it may
fundamentally corrupt the genomes of individual cells.  (And as most
programmers would agree -- break the code and you break the program).
 Michael Lieber at UCLA has estimated that by the time a human is ~70 on the
order of several hundred genes in ones cells have been corrupted (which may
be an
indeterminate effect on the cells functioning).

Problem:
Just looking at the GenBank output for the human Artemis (DCLRE1C) gene
there are on the order of 18 SNPs and 8 possible phosphorylation sites (not
to mention other potential modification sites) -- this combined with the
fact that Methionine and Tryptophan and to a lesser extent Cysteine are more
susceptible to single base mutations (due the alteration of the codon->amino
acid coding even involving single base mutations/repairs) . There are
various programs to analyze such proteins for the critical sites -- SIFT and
the various programs pointed to by their sites.  Now it seems to me that one
could attack this problem by integrating SNPs, mutations, etc. at the
critical sites (where "critical" may or may not be at normal SNPs -- which
presumably are primarily at non-critical sites -- and those proteins where
if you change the coding sequence to non-synomonous amino acids you
potentially break the protein (the real interpretation of which will not be
understood until population studies are done).

So, in the process of looking at the DCLRE1C protein I asked myself, "Why is
there not a BioPerl function which simply enables a visual interpretation of
the critical sites of the protein?"  I.e. some color-coded representation of
the protein (which presumably has some augmented functionality to determine
things like probability or statistical information).  I.e. hand the function
a .fasta file and it will give you an visual (colored) analysis of the
critical nature of specific a.a. -- i.e. something which could be used by
genomic or SNP analysis (such as I presume that being done by 23andme -- as
well as other organizations) to begin to separate out the variations in the
human genome (e.g. SNPs) from the mutations which may effect individuals.

I have the C programming and to a lesser extent Perl experience to
contribute to this -- I lack the BioPerl wisdom to make it generally
available.

If anyone has some suggestions as to what functions/modules might be of use
(in providing a "single-look" view of gene a.a. whose mutations may be more
or less detrimental) I would appreciate hearing from them.

Robert Bradbury

1. "DNA Repair and Mutagenesis", E.C. Friedberg et al, 2nd Ed., ASM Press
(2006)
2. "Aging of the Genome",  J. Vijg, Oxford University Press (2007)



More information about the Bioperl-l mailing list