[BioPython] Performing sequence alignments, etc.
Peter
biopython at maubp.freeserve.co.uk
Sun Oct 14 17:38:32 UTC 2007
Caitlin wrote:
> Hi all.
>
> I'm relatively new to the field of bioinformatics and I'm trying to
> perform a multiple sequence alignment on 5-6 sequences (fasta format -
> dna sequences). I'd like the output to be formatted in the following
> manner (clustalw standalone output):
For reading and writing Clustalw alignment files, you could either use
Bio.SeqIO (format name "clustal") or the Bio.Clustalw module.
http://biopython.org/wiki/SeqIO
> When one more more nucleotides columns are identical, clustalw displays
> an asterisk. If not, a blank space is displayed. Is this a standard
> feature of BioPython?
There is an example of Clustalw output online here - note there can also
be a column of numbers on the right hand side (not shown here):
http://www.bioperl.org/wiki/ClustalW_multiple_alignment_format
It sounds like you are describing the simple consensus string which
clustalw outputs under the alignment (using *:. and space).
Biopython has a SummaryInfo object which can calculate simple consensus
sequences (see the tutorial). Perhaps this would be close to what you
want to do.
> Also, I'm evaluating several sequences but I'd like to obtain the most
> recent complete genomes possible from various countries. Is there a
> convenient source to use (GenBank?) if I don't know the accession
> numbers?
What sort of Genomes? Bacteria? Vertebrates? You could start by having
a look at any of the EMBL, NCBI/GenBank or the Japanese DDBJ (these
three are kept in sync with each other).
Biopython has quite a nice interface for searching and downloading
sequences from GenBank (again, see the tutorial) so that would be my
first suggestion.
Peter
More information about the Biopython
mailing list