[Bioperl-l] Calculating a bunch of SNPs

Albert Vilella avilella at ub.edu
Thu Jan 19 18:31:01 UTC 2006


El dj 19 de 01 del 2006 a les 12:15 -0500, en/na Amir Karger va
escriure:
> I have 96 files. The first is a reference sequence. The other 95 are
> sequences from different genotypes, with minor SNPs compared to the first
> one. I want to generate a list of all the SNPs for each sequence compared to
> the reference sequence. Output format doesn't really matter.

Dear Amir,

If the sequences are simply instances of genotypes/haplotypes, so that
each position already correlates in all 96 sequences, then one
possibility would be to simply create a Bio::Align object by adding
each of them.

Once you have your alignment, you can get the marker information with
the aln_to_population method of Bio::PopGen::Utilities.

 Usage   : my $pop = Bio::PopGen::Utilities->aln_to_population($aln);
 Function: Turn and alignment into a set of L<Bio::PopGen::Individual>
           objects grouped in a L<Bio::PopGen::Population> object

You will see some example output files in t/data/.

There may be other (better or different) ways to do what you need with
Bioperl,

    Albert.

> I was told I could run EMBOSS diffseq on each of the 95 pairs, and parse the
> output to get my list. I'm wondering if there's a Bioperl tool that will do
> what diffseq does, though - presumably outputting Bio::Align objects of some
> kind, or is it Bio::Variation? - rather than parsing 95*N output files.




More information about the Bioperl-l mailing list