[Bioperl-l] how to get the protein sequences from DNA sequences around novel SNPs?
Chris Fields
cjfields at illinois.edu
Tue Nov 10 04:58:32 UTC 2009
On Nov 9, 2009, at 3:15 PM, Robert Bradbury wrote:
> On Mon, Nov 9, 2009 at 1:08 PM, Guangchun Song <gc11song at gmail.com>
> wrote:
>>
>> I'm new bioperl user. I' working on a project: To determine the
>> status of all tutative SNPs such as non-synonymous vs. synonymous,
>> and
>> predict the tranlational effect of non-synonymous mutations as benign
>> or malicious. I'm trying to use bioperl to get the DNA sequence and
>> translate to protein sequence for the SNPs that are in gene's coding
>> region. Could someone tell me how to do it?
>>
>>
> I too would like to know if this information is available. I've
> recently
> been working with the dbSNP results from NCBI but they display the
> results
> in a graphical format rather than data that one can play with and ask
> questions of like "What is the most disease causing gene in the Human
> Genome?" or "What are the critical proteins damaged by gene defects
> in the
> Human Genome?" ... "In terms of premature deaths, extended health care
> requirements, loss of quality of life, etc.?"
>
> The same types of questions can be applied to the dog and cat
> genomes where
> there is emotional value or the cow, horse, pig, etc. genomes where
> there is
> economic value?
>
> The value of BioPerl would increase significantly if there were
> functionality that would allow easy access to "these mutations may
> have
> negative/positive impact" (which means you need a function that
> qualifies
> mutations by degree) and allow for impact to be subjectively
> determined
> (implying there must be some callback function to provide a user
> quality/impact rating).
>
> For example:
> $/@differences = protein_compare($mygene, $refseq_gene,
> @critical_aa,
> @critical_domain, $callback)
> Where $callback could "rate" differences about the protein and
> position and
> the "type of interest" (e.g. metal binding amino acids, structural
> changing
> amino acids, critical catalysis amino acids, etc.).
>
> A default callback would be based on some evolving definition of
> "critical"
> changes which result in human disease for example.
>
> This is a "required" capability to be able to determine things like
> the
> "adaptability" of a species -- those with fewest critical mutation
> points
> may have better adaptability to mutation increasing circumstances.
>
> Please pardon any errors in perl syntax/usage its been a while since
> I've
> written perl and I'd really rather be coding in C.
>
> Robert
I will say that most of the information from the SNP database is
available in various formats (see following link under 'Retrieval
Types'):
http://www.ncbi.nlm.nih.gov/corehtml/query/static/efetchseq_help.html
You can access this information, as well as the full XML, using
something like the following script.
chris
------------------------------------------------
#!/usr/bin/perl -w
use 5.010;
use strict;
use warnings;
use Bio::DB::EUtilities;
my $term = shift;
my $eutil = Bio::DB::EUtilities->new(-eutil => 'esearch',
-db => 'snp',
-term => $term,
-usehistory => 'y',
-retmax => 100);
my $hist = $eutil->next_History || die "No history returned";
# for SNP XML, change retmode to 'xml'
$eutil->set_parameters(-eutil => 'efetch',
-history => $hist,
-retmode => 'text',
-rettype => 'flt');
# dumps to STDOUT
say $eutil->get_Response->content;
More information about the Bioperl-l
mailing list