[BioPython] reading large sequence files
Karin Lagesen
karin.lagesen at labmed.uio.no
Wed Sep 24 03:28:28 EDT 2003
On Tue, Sep 23, 2003 at 03:57:07PM +0100, Leighton Pritchard wrote:
> Hi Karin,
>
> Guessing that you have one .fna sequence file containing the whole sequence
> (or each chromosome/plasmid), then you can use quick_FASTA_reader from
> SeqUtils in a manner similar to:
>
> from Bio.SeqUtils import quick_FASTA_reader
>
> name, seq = quick_FASTA_reader(genome_file)[0]
>
>
> The quick_FASTA_reader reads in (name, sequence) tuples without doing
> anything too clever or time-consuming like parsing sequences as
> SeqRecords. It's *much* faster than using the Fasta.Iterator class.
>
> Hope this helps,
So do I...:)
However, I have come upon a weird thing:
My sequence file looks like this:
>gi|16127994|ref|NC_000913.1| Escherichia coli K12, complete genome
AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC
TTCTGAACTGGTTACCTGCCGTGAGTAAATTAAAATTTTATTGACTTAGGTCACTAAATACTTTAACCAA
TATAGGCATAGCGCACAGACAGATAAAAATTACAGAGTACACAACATCCATGAAACGCATTAGCACCACC
ATTACCACCACCATCACCATTACCACAGGTAACGGTGCGGGCTGACGCGTACAGGAAACACAGAAAAAAG
CCCGCACCTGACAGTGCGGGCTTTTTTTTTCGACCAAAGGTAACGAGGTAACAACCATGCGAGTGTTGAA
GTTCGGCGGTACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTGGAAAGCAATGCC
AGGCAGGGGCAGGTGGCCACCGTCCTCTCTGCCCCCGCCAAAATCACCAACCACCTGGTGGCGATGATTG
AAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAACGTATTTTTGCCGAACTTTT
GACGGGACTCGCCGCCGCCCAGCCGGGGTTCCCGCTGGCGCAATTGAAAACTTTCGTCGATCAGGAATTT
GCCCAAATAAAACATGTCCTGCATGGCATTAGTTTGTTGGGGCAGTGCCCGGATAGCATCAACGCTGCGC
and so on.
When I try to load in this genome it crashes:
File "gene.py", line 11, in __readFastaFile
print quick_FASTA_reader(file)[0]
File "/site/python_packages//lib/python/Bio/SeqUtils/__init__.py",
line 281, in quick_FASTA_reader
name,seq= entry.split('\n',1)
ValueError: unpack list of wrong size
The way I call it is as follows:
def __readFastaFile(self, file):
title, seq = quick_FASTA_reader(file)[0]
return title, seq
Where file is a string containing the absolute file name.
I am reasonably new to python, so please excuse me if I am doing
something obviously wrong/idiotic...:)
Karin
--
Karin Lagesen, PhD student
karin.lagesen at labmed.uio.no
More information about the BioPython
mailing list