[Biopython-dev] Fasta parser

Iddo Friedberg idoerg at burnham.org
Sun Jul 2 04:48:50 UTC 2006


By (lack of?) design, my own biopython using code seems to be using both the martel and non-Martel parsers. I imagine others may have the same. Point being: any design change should make sure that we are back compatible. 

Thanks very much for your work on the Biopython release.

Cheers,

./I

--
Iddo Friedberg, PhD
Burnham Institute for Medical Research
10901 N. Torrey Pines Rd.
La Jolla, CA 92037 USA
T: +1 858 646 3100 x3516
http://iddo-friedberg.org
http://BioFunctionPrediction.org



-----Original Message-----
From: Michiel de Hoon [mailto:mdehoon at c2b2.columbia.edu]
Sent: Sat 7/1/2006 9:43 PM
To: Iddo Friedberg
Cc: biopython-dev at biopython.org
Subject: Re: [Biopython-dev] Fasta parser
 
Thanks Iddo!
I tried the parser in Bio.SeqIO.FASTA and it is indeed a lot faster than 
the Martel-based one in Bio.Fasta.

It would be nice to merge these two modules. However, it raises a bunch 
of design questions (such as Fasta.Record versus SeqRecord, and Seq 
versus string), so it's probably better to wait with that until after 
the next Biopython release. Which, by the way, will be coming up soon.

Thanks,

--Michiel.

Iddo Friedberg wrote:
> Michiel,
> 
> There is actually a simple minded fasta reader/writer  that does not use 
> Martel. Bio.SeqIO.FASTA
> 
> ./I
> 
> --
> Iddo Friedberg, PhD
> Burnham Institute for Medical Research
> 10901 N. Torrey Pines Rd.
> La Jolla, CA 92037 USA
> T: +1 858 646 3100 x3516
> http://iddo-friedberg.org
> http://BioFunctionPrediction.org
> 
> 
> 
> -----Original Message-----
> From: biopython-dev-bounces at lists.open-bio.org on behalf of Michiel de Hoon
> Sent: Sat 7/1/2006 2:47 PM
> To: biopython-dev at biopython.org
> Subject: [Biopython-dev] Fasta parser
> 
> Hi everybody,
> 
> The Biopython shows the following approach to parsing a Fasta file:
> 
>  >>> from Bio import Fasta
>  >>> parser = Fasta.RecordParser()
>  >>> file = open("ls_orchid.fasta")
>  >>> iterator = Fasta.Iterator(file, parser)
>  >>> cur_record = iterator.next()
> 
> But for large Fasta files, it's very slow, compared to file.read(),
> which may be due to going through Martel (I believe the same was true
> for large GenBank files).
> 
> So I'm thinking about writing a simple-minded Fasta parser for better
> performance with large files. What I'm wondering about:
> 1) Is there some advantage that I overlooked of using Martel for parsing
> Fasta files?
> 2) Why is it necessary to create a parser first and passing it to
> Fasta.Iterator? Are there any cases where Fasta.Iterator uses something
> other than a Fasta.RecordParser?
> 
> --Michiel.
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
> 






More information about the Biopython-dev mailing list