[BioPython] Bio.SeqIO and files with one record
Peter
biopython at maubp.freeserve.co.uk
Tue Jul 10 20:03:10 UTC 2007
Dear Biopython people,
I'd like a little feedback on the Bio.SeqIO module - in particular, one
situation I think could be improved is when dealing with sequences files
which contain a single record - for example a very simple Fasta file, or
a chromosome in a GenBank file.
http://www.biopython.org/wiki/SeqIO
The shortest way to get this one record as a SeqRecord object is probably:
from Bio import SeqIO
record = SeqIO.parse(open("example.gbk"), "genbank").next()
This works, assuming there is at least one record, but will not trigger
any error if there was more than one record - something you may want to
check.
Do any of you think this situation is common enough to warrant adding
another function to Bio.SeqIO to do this for you (raising errors for no
records or more than one record). My suggestions for possible names
include parse_single, parse_one, parse_sole, parse_individual and mono_parse
One way to do this inline would be:
from Bio import SeqIO
temp_list = list(SeqIO.parse(open("example.gbk"), "genbank"))
assert len(temp_list) == 1
record = temp_list[0]
del temp_list
Or perhaps:
from Bio import SeqIO
temp_iter = list(SeqIO.parse(open("example.gbk"), "genbank"))
record = temp_iter.next()
try :
assert temp_iter.next() is None
except StopIteration :
pass
del temp_iter
The above code copes with the fact that in general some iterators may
signal the end by raising a StopIteration except, or by returning None.
Peter
P.S. Any comments on the Bio.AlignIO ideas I raised back in May 2007?
http://lists.open-bio.org/pipermail/biopython/2007-May/003472.html
More information about the Biopython
mailing list