[Bioperl-l] Classifying SNPs

Mon Jul 13 15:43:13 UTC 2009

Thanks Abhi-- I had a feeling there was more (or "less") to it-- this would be a 
nice feature to have, don't think it exists. Will think about it-- cheers
----- Original Message ----- 
From: "Abhishek Pratap" <abhishek.vit at gmail.com>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Monday, July 13, 2009 11:10 AM
Subject: Re: [Bioperl-l] Classifying SNPs


> Dear Mark
> Sorry I was not able to reply earlier. Many Thanks for your detailed
> explanation. However this is not exactly what I am looking for. May be my
> initial mail was not well articulated or I am not able to infer your reply
> fully. My bad.
>
> Well as an input what we have is the just the genomic coordinates for SNP's
> predicted by Illumina propriety software CASAVA. What we would like to do is
> to further classify these predicted SNP's  . If they fall into Coding region
> then whether they are synonymous/non-syn SNPs.
>
> So I guess something which translates
> 1. SNP genomic coordinate into mRNA offset
> 2. Then identify the ORF and target codon and check whether the SNP
> substitution will be syn/non-syn.
>
> Thanks,
> -Abhi
>
> On Wed, Jul 8, 2009 at 11:23 AM, Mark A. Jensen <maj at fortinbras.us> wrote:
>
>> Hey Abhishek-
>> You might root around in Bio::PopGen. Here's a script to get stuff from
>> raw fasta data--see comments within.
>> cheers
>> Mark
>>
>> use Bio::AlignIO;
>> use Bio::PopGen::Utilities;
>>
>> $file = "your_raw_file.fas";
>>
>>
>> my $aln = Bio::AlignIO->new(-format=>'fasta', -file=>$file)->next_aln;
>> # get the alignment into a Bio::PopGen::Population format, with codons
>> # as the marker sites
>> my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment=>$aln,
>> -site_model=>'cod');
>> # here are your variable codons...
>> my @cdnpos = $pop->get_marker_names;
>> # here are your individuals represented in the alignment
>> my @inds = $pop->get_Individuals;
>> # which have names like "Codon-3-9", "Codon-4-12", etc
>> foreach my $cdn (@cdnpos) {
>>   # calculate the unique codons represented at this codon position
>>   my (%ucdns, @ucdns);
>>   @genos = $pop->get_Genotypes(-marker=>$cdn);
>>   $ucdns{$_->get_Alleles}++ for @genos;
>>   @ucdns = sort keys %ucdns;
>>   #
>>   # here, use translate or something faster to identify syn/non-syn
>>   # check out code in Bio::Align::DNAStatistics for various methods
>>
>> }
>> # relate back to individuals with this
>> foreach my $ind (@inds) {
>>   print "Individual ".$ind->unique_id."\n";
>>   print "Site\tAllele\n";
>>   foreach my $cdn (@cdnpos) {
>> print $cdn, "\t", $ind->get_Genotypes($cdn)->get_Alleles, "\n";
>>   }
>> }
>>
>>
>> 1;
>>
>> ----- Original Message ----- From: "Abhishek Pratap" <
>> abhishek.vit at gmail.com>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, July 08, 2009 10:24 AM
>> Subject: [Bioperl-l] Classifying SNPs
>>
>>
>>
>> Hi All
>>
>> This might seem to be an old track question. However I was not able to
>> find a good answer in the many diff mailing list archives.
>>
>> For all our SNP predictions we would like to know whether they are
>> synonymous / non-synonymous. If Non-synonymous/Exonic then find the
>> position on the gene where amino acid is getting changed and to what
>> ...Also info about indels will help.
>>
>> I am not sure if something like this already exists. If not even some
>> pointers on how to move forward will help.
>>
>> Thanks,
>> -Abhi
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>