[BioPython] Creating a parser for Quantarray data?

Andrew Dalke dalke at dalkescientific.com
Tue Aug 26 14:25:16 EDT 2003


Peter Wilkinson:
> I have been looking through CVS for some, and I dont see any. Is it 
> worth creating a parser within the parser framework within biopython 
> (Martel), or shall I build a something separately.
>
> These can be large files, and I would want to implement something that 
> is efficient.
>
> How would Martel handle a 20M Record like a Quantarray file? When I 
> was parsing genomic Genbank files (Bacteria), the Genbank parser's 
> performance started to suffer ...

Yeah, Martel is poor that way.  I've got the RecordReaders as a 
workaround for
when a single record is small enough.  Otherwise, there's about a x5 
memory
overhead.

I've got some highly experimental code which fixes part of the problem
(it ended up being a pure-python regexp engine) but it isn't usable and 
has
problems of its own.

So for this task, it's likely best you write your own parser.

*sigh*

Anyone want to fund me for the month or two it will take to finish
up Martel.  :)

					Andrew
					dalke at dalkescientific.com



More information about the BioPython mailing list