[BioPython] writing clustalw alignment to file
Brad Chapman
chapmanb@arches.uga.edu
Thu, 6 Dec 2001 08:31:36 -0500
Hi Scott;
Gack! Discovered this while I was cleaning out my immense mail backlog.
Sorry for the very late reply. Hope this still helps.
> (1) I am trying to convert a clustalw alignment to a fasta alignment (pg.
> 38-39 of Cookbook). I can do this just like in the book, but then I want to
> write the fasta alignment to a file and I don't know how to do this.
You should just be able to do:
handle = open("yourfile.fasta", "w")
handle.write(str(fasta_align))
handle.close()
It looks like I wrote a custom __str__ representation, so this should
write a sensible FASTA format to the file.
> Also, I
> would really like to search through the fasta or clustalw file for
> particular sequence names (titles) so that I can retrieve the particular
> sequence but I do not know how this is done.
>
> For instance, I would like to find the following sequence from a file:
>
> >APE2174
> ------------------------------------------------------------
> --------------------------MVKGSQVKPSTTELLLKAVSAKAPSGD-------
> ---PIHQK---IGDLPFEKIVEIAIEKKPD--LLAKTLKAAVKTILGSARSIGVTVDGKD
> PKEVTRQVDEGVYDAVLAKYEEKWEEAEG
For this, I would just load the file in with a fasta parser, and then
search for the title you are looking for. Something like:
handle = open("my_file", "r")
parser = Fasta.RecordParser()
iterator = Fasta.Iterator(handle, parser)
while 1:
cur_record = iterator.next()
if cur_record is None:
break
if cur_record.title.find("your_title_of_interest") >= 0:
print "Here's the record:\n %s" % cur_record
Does this do what you need?
> (2) The other question has to do with parsing Genbank files. I can use the
> Genbank parser to retrieve exon and intron sequences just fine and I give a
> (partial) example of a function I wrote to do this below from the "mRNA"
> feature (very similar to the example in the cookbook). However, when I try
> to get the positions from the "gene" feature I run into problems. For
> example:
>
> gene 2..1021
> /gene="HI0001"
>
> All I want is the positions of the gene and the gene name but I keep running
> into problems.
Okay dokee. It looks like you are almost there. I think the following
would work:
def get_introns(genbank_file_name):
gb_handle = open(genbank_file_name, "r")
parser = GenBank.FeatureParser()
iterator = GenBank.Iterator(gb_handle, parser)
while 1:
cur_record = iterator.next()
if cur_record is None:
break
gene_info = find_genes(record)
# do whatever you want to do with the gene information
def find_genes(gb_record):
"""Find features reporting gene information in a GenBank Record.
Returns a list of three items, where each item has:
(gene_start, gene_end, gene_name)
"""
mrna_info = []
for feature in gb_record.features:
if feature.type == "mRNA":
mrna_start = feature.location.nofuzzy_start
mrna_end = feature.location.nofuzzy_end
if feature.qualifiers.has_key("gene"):
mrna_name = feature.qualifiers["gene"]
else:
mrna_name = "not given"
mrna_info.append((mrna_start, mrna_end, mrna_name))
return mrna_info
Does this do the right thing? Or at least given you a hint?
Hope this helps. Sorry again for the long delay in responding.
Careful-with-the-untested-code-I-wrote-in-my-mailer-ly yr's,
Brad