[Bioperl-l] Classifying SNPs

Jason Stajich jason at bioperl.org
Tue Jul 14 00:02:39 UTC 2009


Having a lightweight system for this could be helpful if it isn't  
replicated somewhere else.  I don't think the NGS is really changing  
per se -- really it is just that more people have SNPs called in more  
research systems.

It would be a simple little project I think coding up some basics if  
you think you'd actually take it over and run with it and/or push it  
into a semi-organized set of scripts?

-jason
On Jul 13, 2009, at 10:13 AM, Abhishek Pratap wrote:

> Hi Jason
>
> Thanks for a detailed insight. I would definitely go the ensembl way  
> first
> and try to see if it can do exactly what we want.
>
> In case it does/'nt I will report back on this same thread. I think  
> having
> something like this in the Bioperl will help the NGS community. Lot of
> people are predicting SNPs from NGS.oops(next generation  
> sequencing ) data
> and looking for ways to better annotate/classify their predictions.
>
> Thanks guys .. It is a pleasure to interact with you all. Just  
> overwhelmed
> to see the responses.
>
> best,
> -Abhi
>
>
>
> On Mon, Jul 13, 2009 at 12:54 PM, Jason Stajich <jason at bioperl.org>  
> wrote:
>
>> Ensembl would be best place to go if you are working with human  
>> SNPs but
>> for those who aren't so data lucky...
>>
>> Aspects of this also relates to the dn/dS code in the
>> Bio::Align::DNAStatistics -- thought it does the classification and
>> comparison all at once so you'd have to dig code out.
>>
>> And the mcdonald_kreitman code in Bio::PopGen::Statistics which  
>> computes a
>> synonymous or nonsynonymous via lookup table that is stored in
>> Bio::MolEvol::CodonModel which compares the edit path which is  
>> encoded as
>> the two codons concatenated together -- i.ee
>>
>> use Bio::MolEvol::CodonModel;
>> my $codon_path = Bio::MolEvol::CodonModel->codon_path;
>> my ($ns, $syn) = $codon_path->{'AATAAC'};
>> print "AAT -> AAC: $ns ns mutations, $syn syn mutations\n";
>>
>>
>> It all kind of depends on how you have the data organized, if it is  
>> just
>> SNPs and you are trying to figure out if they are syn or non-syn  
>> then you
>> kind of need a good database to do this since you'll have to know  
>> what gene
>> they are in, CDS of the gene, etc.  It is possible to do with  
>> something as
>> basic as GFF3 for your genome and the SNP locations and
>> Bio::DB::SeqFeature::Store.  While I can think of a way to code it  
>> up from
>> those bare-bones - maybe you should report back if you can just use  
>> the
>> Ensembl classification of the SNPs?
>>
>> -jason
>>
>>
>> On Jul 13, 2009, at 9:33 AM, Chris Fields wrote:
>>
>> Bio::Coordinate might help with coordinate conversion.  However,  
>> much of
>>> this sounds very Ensembl-like.  Have you looked at the Ensembl  
>>> perl API?  It
>>> can do #1 (coordinate conversion), and I'm sure something could be  
>>> written
>>> up to do the second.
>>>
>>> chris
>>>
>>> On Jul 13, 2009, at 10:43 AM, Mark A. Jensen wrote:
>>>
>>> Thanks Abhi-- I had a feeling there was more (or "less") to it--  
>>> this
>>>> would be a nice feature to have, don't think it exists. Will  
>>>> think about
>>>> it-- cheers
>>>> ----- Original Message ----- From: "Abhishek Pratap" <
>>>> abhishek.vit at gmail.com>
>>>> To: "Mark A. Jensen" <maj at fortinbras.us>
>>>> Cc: <bioperl-l at lists.open-bio.org>
>>>> Sent: Monday, July 13, 2009 11:10 AM
>>>> Subject: Re: [Bioperl-l] Classifying SNPs
>>>>
>>>>
>>>> Dear Mark
>>>>> Sorry I was not able to reply earlier. Many Thanks for your  
>>>>> detailed
>>>>> explanation. However this is not exactly what I am looking for.  
>>>>> May be
>>>>> my
>>>>> initial mail was not well articulated or I am not able to infer  
>>>>> your
>>>>> reply
>>>>> fully. My bad.
>>>>>
>>>>> Well as an input what we have is the just the genomic  
>>>>> coordinates for
>>>>> SNP's
>>>>> predicted by Illumina propriety software CASAVA. What we would  
>>>>> like to
>>>>> do is
>>>>> to further classify these predicted SNP's  . If they fall into  
>>>>> Coding
>>>>> region
>>>>> then whether they are synonymous/non-syn SNPs.
>>>>>
>>>>> So I guess something which translates
>>>>> 1. SNP genomic coordinate into mRNA offset
>>>>> 2. Then identify the ORF and target codon and check whether the  
>>>>> SNP
>>>>> substitution will be syn/non-syn.
>>>>>
>>>>> Thanks,
>>>>> -Abhi
>>>>>
>>>>> On Wed, Jul 8, 2009 at 11:23 AM, Mark A. Jensen  
>>>>> <maj at fortinbras.us>
>>>>> wrote:
>>>>>
>>>>> Hey Abhishek-
>>>>>> You might root around in Bio::PopGen. Here's a script to get  
>>>>>> stuff from
>>>>>> raw fasta data--see comments within.
>>>>>> cheers
>>>>>> Mark
>>>>>>
>>>>>> use Bio::AlignIO;
>>>>>> use Bio::PopGen::Utilities;
>>>>>>
>>>>>> $file = "your_raw_file.fas";
>>>>>>
>>>>>>
>>>>>> my $aln = Bio::AlignIO->new(-format=>'fasta', -file=>$file)- 
>>>>>> >next_aln;
>>>>>> # get the alignment into a Bio::PopGen::Population format, with  
>>>>>> codons
>>>>>> # as the marker sites
>>>>>> my $pop = Bio::PopGen::Utilities->aln_to_population(-alignment=> 
>>>>>> $aln,
>>>>>> -site_model=>'cod');
>>>>>> # here are your variable codons...
>>>>>> my @cdnpos = $pop->get_marker_names;
>>>>>> # here are your individuals represented in the alignment
>>>>>> my @inds = $pop->get_Individuals;
>>>>>> # which have names like "Codon-3-9", "Codon-4-12", etc
>>>>>> foreach my $cdn (@cdnpos) {
>>>>>> # calculate the unique codons represented at this codon position
>>>>>> my (%ucdns, @ucdns);
>>>>>> @genos = $pop->get_Genotypes(-marker=>$cdn);
>>>>>> $ucdns{$_->get_Alleles}++ for @genos;
>>>>>> @ucdns = sort keys %ucdns;
>>>>>> #
>>>>>> # here, use translate or something faster to identify syn/non-syn
>>>>>> # check out code in Bio::Align::DNAStatistics for various methods
>>>>>>
>>>>>> }
>>>>>> # relate back to individuals with this
>>>>>> foreach my $ind (@inds) {
>>>>>> print "Individual ".$ind->unique_id."\n";
>>>>>> print "Site\tAllele\n";
>>>>>> foreach my $cdn (@cdnpos) {
>>>>>> print $cdn, "\t", $ind->get_Genotypes($cdn)->get_Alleles, "\n";
>>>>>> }
>>>>>> }
>>>>>>
>>>>>>
>>>>>> 1;
>>>>>>
>>>>>> ----- Original Message ----- From: "Abhishek Pratap" <
>>>>>> abhishek.vit at gmail.com>
>>>>>> To: <bioperl-l at lists.open-bio.org>
>>>>>> Sent: Wednesday, July 08, 2009 10:24 AM
>>>>>> Subject: [Bioperl-l] Classifying SNPs
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi All
>>>>>>
>>>>>> This might seem to be an old track question. However I was not  
>>>>>> able to
>>>>>> find a good answer in the many diff mailing list archives.
>>>>>>
>>>>>> For all our SNP predictions we would like to know whether they  
>>>>>> are
>>>>>> synonymous / non-synonymous. If Non-synonymous/Exonic then find  
>>>>>> the
>>>>>> position on the gene where amino acid is getting changed and to  
>>>>>> what
>>>>>> ...Also info about indels will help.
>>>>>>
>>>>>> I am not sure if something like this already exists. If not  
>>>>>> even some
>>>>>> pointers on how to move forward will help.
>>>>>>
>>>>>> Thanks,
>>>>>> -Abhi
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> --
>> Jason Stajich
>> jason at bioperl.org
>>
>>

--
Jason Stajich
jason at bioperl.org




More information about the Bioperl-l mailing list