[Bioperl-l] Classifying SNPs

Jason Stajich jason at bioperl.org
Mon Jul 13 16:54:00 UTC 2009


Ensembl would be best place to go if you are working with human SNPs  
but for those who aren't so data lucky...

Aspects of this also relates to the dn/dS code in the  
Bio::Align::DNAStatistics -- thought it does the classification and  
comparison all at once so you'd have to dig code out.

And the mcdonald_kreitman code in Bio::PopGen::Statistics which  
computes a synonymous or nonsynonymous via lookup table that is stored  
in Bio::MolEvol::CodonModel which compares the edit path which is  
encoded as the two codons concatenated together -- i.ee

use Bio::MolEvol::CodonModel;
  my $codon_path = Bio::MolEvol::CodonModel->codon_path;
  my ($ns, $syn) = $codon_path->{'AATAAC'};
  print "AAT -> AAC: $ns ns mutations, $syn syn mutations\n";


It all kind of depends on how you have the data organized, if it is  
just SNPs and you are trying to figure out if they are syn or non-syn  
then you kind of need a good database to do this since you'll have to  
know what gene they are in, CDS of the gene, etc.  It is possible to  
do with something as basic as GFF3 for your genome and the SNP  
locations and Bio::DB::SeqFeature::Store.  While I can think of a way  
to code it up from those bare-bones - maybe you should report back if  
you can just use the Ensembl classification of the SNPs?

-jason

On Jul 13, 2009, at 9:33 AM, Chris Fields wrote:

> Bio::Coordinate might help with coordinate conversion.  However,  
> much of this sounds very Ensembl-like.  Have you looked at the  
> Ensembl perl API?  It can do #1 (coordinate conversion), and I'm  
> sure something could be written up to do the second.
>
> chris
>
> On Jul 13, 2009, at 10:43 AM, Mark A. Jensen wrote:
>
>> Thanks Abhi-- I had a feeling there was more (or "less") to it--  
>> this would be a nice feature to have, don't think it exists. Will  
>> think about it-- cheers
>> ----- Original Message ----- From: "Abhishek Pratap" <abhishek.vit at gmail.com 
>> >
>> To: "Mark A. Jensen" <maj at fortinbras.us>
>> Cc: <bioperl-l at lists.open-bio.org>
>> Sent: Monday, July 13, 2009 11:10 AM
>> Subject: Re: [Bioperl-l] Classifying SNPs
>>
>>
>>> Dear Mark
>>> Sorry I was not able to reply earlier. Many Thanks for your detailed
>>> explanation. However this is not exactly what I am looking for.  
>>> May be my
>>> initial mail was not well articulated or I am not able to infer  
>>> your reply
>>> fully. My bad.
>>>
>>> Well as an input what we have is the just the genomic coordinates  
>>> for SNP's
>>> predicted by Illumina propriety software CASAVA. What we would  
>>> like to do is
>>> to further classify these predicted SNP's  . If they fall into  
>>> Coding region
>>> then whether they are synonymous/non-syn SNPs.
>>>
>>> So I guess something which translates
>>> 1. SNP genomic coordinate into mRNA offset
>>> 2. Then identify the ORF and target codon and check whether the SNP
>>> substitution will be syn/non-syn.
>>>
>>> Thanks,
>>> -Abhi
>>>
>>> On Wed, Jul 8, 2009 at 11:23 AM, Mark A. Jensen  
>>> <maj at fortinbras.us> wrote:
>>>
>>>> Hey Abhishek-
>>>> You might root around in Bio::PopGen. Here's a script to get  
>>>> stuff from
>>>> raw fasta data--see comments within.
>>>> cheers
>>>> Mark
>>>>
>>>> use Bio::AlignIO;
>>>> use Bio::PopGen::Utilities;
>>>>
>>>> $file = "your_raw_file.fas";
>>>>
>>>>
>>>> my $aln = Bio::AlignIO->new(-format=>'fasta', -file=>$file)- 
>>>> >next_aln;
>>>> # get the alignment into a Bio::PopGen::Population format, with  
>>>> codons
>>>> # as the marker sites
>>>> my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment=> 
>>>> $aln,
>>>> -site_model=>'cod');
>>>> # here are your variable codons...
>>>> my @cdnpos = $pop->get_marker_names;
>>>> # here are your individuals represented in the alignment
>>>> my @inds = $pop->get_Individuals;
>>>> # which have names like "Codon-3-9", "Codon-4-12", etc
>>>> foreach my $cdn (@cdnpos) {
>>>> # calculate the unique codons represented at this codon position
>>>> my (%ucdns, @ucdns);
>>>> @genos = $pop->get_Genotypes(-marker=>$cdn);
>>>> $ucdns{$_->get_Alleles}++ for @genos;
>>>> @ucdns = sort keys %ucdns;
>>>> #
>>>> # here, use translate or something faster to identify syn/non-syn
>>>> # check out code in Bio::Align::DNAStatistics for various methods
>>>>
>>>> }
>>>> # relate back to individuals with this
>>>> foreach my $ind (@inds) {
>>>> print "Individual ".$ind->unique_id."\n";
>>>> print "Site\tAllele\n";
>>>> foreach my $cdn (@cdnpos) {
>>>> print $cdn, "\t", $ind->get_Genotypes($cdn)->get_Alleles, "\n";
>>>> }
>>>> }
>>>>
>>>>
>>>> 1;
>>>>
>>>> ----- Original Message ----- From: "Abhishek Pratap" <
>>>> abhishek.vit at gmail.com>
>>>> To: <bioperl-l at lists.open-bio.org>
>>>> Sent: Wednesday, July 08, 2009 10:24 AM
>>>> Subject: [Bioperl-l] Classifying SNPs
>>>>
>>>>
>>>>
>>>> Hi All
>>>>
>>>> This might seem to be an old track question. However I was not  
>>>> able to
>>>> find a good answer in the many diff mailing list archives.
>>>>
>>>> For all our SNP predictions we would like to know whether they are
>>>> synonymous / non-synonymous. If Non-synonymous/Exonic then find the
>>>> position on the gene where amino acid is getting changed and to  
>>>> what
>>>> ...Also info about indels will help.
>>>>
>>>> I am not sure if something like this already exists. If not even  
>>>> some
>>>> pointers on how to move forward will help.
>>>>
>>>> Thanks,
>>>> -Abhi
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org




More information about the Bioperl-l mailing list