[Biopython-dev] Fasta parser

Colosimo, Marc E. mcolosimo at mitre.org
Sun Jul 2 18:36:22 UTC 2006




On 7/1/06 5:47 PM, "Michiel de Hoon" <mdehoon at c2b2.columbia.edu> wrote:

> Hi everybody,
> 
> The Biopython shows the following approach to parsing a Fasta file:
> 
>>>> from Bio import Fasta
>>>> parser = Fasta.RecordParser()
>>>> file = open("ls_orchid.fasta")
>>>> iterator = Fasta.Iterator(file, parser)
>>>> cur_record = iterator.next()
> 
> But for large Fasta files, it's very slow, compared to file.read(),
> which may be due to going through Martel (I believe the same was true
> for large GenBank files).
> 
> So I'm thinking about writing a simple-minded Fasta parser for better
> performance with large files. What I'm wondering about:
> 1) Is there some advantage that I overlooked of using Martel for parsing
> Fasta files?
> 2) Why is it necessary to create a parser first and passing it to
> Fasta.Iterator? Are there any cases where Fasta.Iterator uses something
> other than a Fasta.RecordParser?

Yes!!!! I use Fasta.SequenceParser which gives me a SeqRecord Object
(Bio.SeqRecord) not some odd Fasta.Record Object that I would have to then
remap into a SeqRecord.

Also, could someone re-run epydoc! My changes in the code have not made it
to the on-line API docs.

Marc




More information about the Biopython-dev mailing list