[Biopython] Generator expression for SeqIO
Eric Talevich
eric.talevich at gmail.com
Wed Dec 7 16:07:57 UTC 2011
Mic,
You don't really need a generator expression here, but I recommend that you
read the Python Tutorial to learn how to use them anyway.
To solve your problem, here's one solution using Biopython and a list
comprehension (like a generator expression, but more your pace):
def row_to_seqrecord(row):
"""Convert a tab-delimited row to a SeqRecord.
Row looks like:
test1\t0001\a1\tAATTCC
Record looks like (conceptually):
>test1_a1
AATTCC
"""
cells = [cell.strip() for cell in row.split('\t')]
return SeqRecord(Seq(cells[3]), id=cells[0] + '_' + cells[2])
with open('input.txt') as infile:
records = [row_to_seqrecord(line) for line in infile]
SeqIO.write(records, 'output.txt', 'fasta')
But the nice thing about FASTA format is that there's almost no structure
to it. Here's a simpler way to do it that doesn't use Biopython:
with open('input.txt') as infile:
with open('output.fasta', 'w+') as outfile:
for line in infile:
parts = [part.strip() for part in line.split('\t')]
if len(parts) != 4:
continue
# Header
outfile.write(">%s_%s\n" % (parts[0], parts[2])
# Sequence
outfile.write(parts[3] + '\n')
On Tue, Dec 6, 2011 at 11:41 PM, Mic <mictadlo at gmail.com> wrote:
> No worries is was perfect.
>
> I have the following code and I do not know how to combine the *header* and
> *seq* variables from the '*with*' statement with generator expression?
>
> from Bio import SeqIO
> from Bio.SeqRecord import SeqRecord
> from Bio.Seq import Seq
> from pprint import pprint
>
> if __name__ == '__main__':
>
> *with* open('input.txt') as f:
> for line in f:
> try:
> splited_line = line.split('\t')
>
> *header* = splited_line[0] +'_'+ splited_line[2]
> *seq* = splited_line[3]
> except IndexError:
> continue
>
> fasta_file = open('output.fasta', 'w')
> records = (SeqRecord(???), id=????, description="") for i in ???)
>
> SeqIO.write(records, fasta_file, "fasta")
>
> Thank you in advance.
>
> On Thu, Dec 1, 2011 at 6:52 PM, Peter Cock <p.j.a.cock at googlemail.com
> >wrote:
>
> >
> >
> > On Wednesday, November 30, 2011, Mic <mictadlo at gmail.com> wrote:
> > > Thank you it is working.
> > >
> >
> > Excellent - sorry I couldn't think of a nice way to explain the syntax.
> >
> > Peter
> _______________________________________________
> Biopython mailing list - Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
More information about the Biopython
mailing list