[Biopython] Read sequence from file

Peter Cock p.j.a.cock at googlemail.com
Wed Feb 25 17:39:08 UTC 2015


On Wed, Feb 25, 2015 at 4:03 PM, Horea Chrristian <h.chr at mail.ru> wrote:
> Hi guys, how can I read a sequence from a .txt file which contains only a
> string of letters (nucleotides)? I tried `SeqIO.read("my/file","...")` but
> if my second value is fasta or genbank, it complains about missing handles,
> and nothing like "plain", "string", or "str" worked... What can I do? It
> would be nice if I can do this via a one-liner rather than just read it
> explicitly with python and then explicitly parse it.
>
> Cheers,

Right now you'd just do something like this:

with open("my_example.txt") as handle:
    my_seq_as_string = handle.read().strip()

Or, if you want a Seq object with eg DNA alphabet,

from Bio.Seq import Seq
from Bio.Alphabet import generic_dna
with open("my_example.txt") as handle:
    my_seq = Seq(handle.read().strip(), generic_dna)

I'm assuming there are no line breaks or other whitespace etc.

What you are asking for sounds a bit like adding what EMBOSS calls
the "raw" file format to Biopython's SeqIO:
http://emboss.sourceforge.net/docs/themes/SequenceFormats.html

If this was added, what would you expect as the record's identifier?

Also would you expect one sequence regardless of any line breaks in the
file - or one sequence per line?

Peter


More information about the Biopython mailing list