[Bioperl-l] extracting CDS portion of RefSeqs

Thu Dec 22 12:26:11 EST 2005

On Dec 21, 2005, at 9:59 AM, Cook, Malcolm wrote:

> FYI - using the $builder as above to read 46 GenBank mRNA RefSeq
> containing lots of REFERENCE data gave me ~ 33% speed up
> HOWEVER, I get %52 speed up if instead I pre-filtered the genbank
> flatfile using:
> 	perl -n -e "print if (m'^(LOCUS|ACCESSION)' ||
> (m'^FEATURES'...m'^//'))"
>

63% of the 'best possible' speedup isn't too bad I'd say :-) Seriously 
though.

Note that SeqBuilder isn't really the event-driven architecture that 
would allow an event handler full control over what gets parsed and 
what doesn't. Rather, it is up to the parser how much advantage it 
wants to take from the interface (by aborting/skipping certain sections 
of an entry).

Note also that obviously none of these things will be as fast as the 
pure perl short-cut - you still have to create objects and go through 
quite a few more control structures.

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------