[Biopython] API support for finding polymorphisms?
Alex Lancaster
alexl at users.sourceforge.net
Tue Oct 19 03:04:00 UTC 2010
----- Original Message -----
> Hi, Alex. Are you working from short read data? If so, what platform?
> In what format are the aligned data?
Hi Sean,
I'm actually working from yeast literature data released by Sanger:
http://www.sanger.ac.uk/research/projects/genomeinformatics/sgrp.html
The raw data is available via ftp in several formats including FASTQ
and others, the PDF for more info:
http://www.sanger.ac.uk/research/projects/genomeinformatics/sgrp_manual.pdf
The original data is a mixture of Solexa/Illumina and ABI, different
platforms for different yeast strains.
They include a Perl script (alicat.pl) that can parse some of the
alignments that they had performed already (including both sequence
alignments with errors as well as imputed sequences with errors
and missing data corrected). I have been working with the imputed
alignments as I didn't want to go all the way back and re-align from
scratch all the raw data.
I could probably hack the Perl script to do some of what I need (it
already has a facility to print out only polymorphic positions from
the imputed alignments), but I would like a more robust Python-based
solution. My first thought was to use the alicat.pl script to output
the alignments and the imputed sequences, convert them into full sequences
and then use Python-based solution from there to identify and classify
the individual polymorphisms.
At the moment, I'm only interested in looking at a couple of specific
genes, so it's not a genome-wide survey (i.e. I only need to keep
one or two genes and alignments in memory at once), but I'd like the
solution to generalizable, so I could specify any yeast gene in the SGD
and include polymorphisms in both promoters as well as coding regions.
Alex
More information about the Biopython
mailing list