[Biopython] Concatenate all the sequences with same gene name

Joshua Klein mobiusklein at gmail.com
Fri Jul 17 12:30:49 UTC 2015


You can use a dictionary to map names to a storage place for each sequence.
To avoid tedious checking if a name is already present in a dictionary, we
can use the defaultdict type.

from collections import defaultdict

sequence_map = defaultdict(str)

for cur_record in SeqIO.parse('nucleotide_seq.fasta', "fasta"):
  sequence_map[cur_record.name] += str(cur_record.seq)

You can then look up each sequence in the sequence_map by name or by using
the common dictionary methods here
<https://docs.python.org/2/library/stdtypes.html#typesmapping>.



On Fri, Jul 17, 2015 at 5:50 AM, sunwm9 <sunwm9 at tom.com> wrote:

>
> Dear all,
>
> I am new self-educator of Biopython. I would like to concatenate fasta sequences (file attached) with same gene name using SeqIO.parse() function.
>
> Here is a snippet from my code:
>
>
> *seq_all = ' '*
>
> *for cur_record in SeqIO.parse('nucleotide_seq.fasta', "fasta") :*
>
> *     if cur_record.name <http://cur_record.name> == 'B103':*
>
> *         seq_all = seq_all + str(cur_record.seq)*
>
> *      print seq_all*
>
>
> It need to change the gene name every time. How can I concatenate all the sequences with same gene name automatically.
>
> <ribozyme at ioz.ac.cn>
>
> Best regards,
>
> Weiming Sun
>
>
>
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20150717/eb760fa2/attachment.html>


More information about the Biopython mailing list