[Biopython] Parsing FASTA records based on headers
Dorota Matelska
surykartka at gmail.com
Mon Jul 11 17:02:51 UTC 2011
Hi Fabio,
You forgot to change also the format name of your input file while using SeqIO.parse(). Your input is of fasta format, so instead of "genbank" put there "fasta", and it should work.
Hope this will help you :-)
Dorota
On Jul 11, 2011, at 6:07 PM, Fabio Gori wrote:
> Hi all,
>
> I tried to parse a FASTA file to select the sequences whose headers satisfy a
> condition. The condition is that the first word of the header belongs to a list
> named SelectedSequencesId.
> In the page http://biopython.org/wiki/SeqIO, I found this example, where the
> condition is that sequence length <300:
>
> 1 from Bio import SeqIO
> 2
> 3 input_seq_iterator = SeqIO.parse(open("cor6_6.gb", "rU"), "genbank")
> 4 short_seq_iterator = (record for record in input_seq_iterator \
> 5 if len(record.seq) < 300)
> 6
> 7 output_handle = open("short_seqs.fasta", "w")
> 8 SeqIO.write(short_seq_iterator, output_handle, "fasta")
> 9 output_handle.close()
>
> so I tried to substitute line 5 with
> 5 record.id.split()[0] in SelectedSequencesId)
>
> But it did not work.
> I was able to get what I wanted generating a list with all the records and
> then parsing it, but I'd like to find a solution that uses a generating
> expression.
>
> Thanks in advance,
>
> Fabio
>
> --
>
> F. Gori, PhD student
> Intelligent Systems
> ICIS (Institute for Computing and Information Sciences)
> Radboud University Nijmegen
>
> Home Page: http://www.cs.ru.nl/~gori/
> _______________________________________________
> Biopython mailing list - Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
More information about the Biopython
mailing list