[Bioperl-l] extracting CDS portion of RefSeqs
Hilmar Lapp
hlapp at gmx.net
Thu Dec 22 12:26:11 EST 2005
On Dec 21, 2005, at 9:59 AM, Cook, Malcolm wrote:
> FYI - using the $builder as above to read 46 GenBank mRNA RefSeq
> containing lots of REFERENCE data gave me ~ 33% speed up
> HOWEVER, I get %52 speed up if instead I pre-filtered the genbank
> flatfile using:
> perl -n -e "print if (m'^(LOCUS|ACCESSION)' ||
> (m'^FEATURES'...m'^//'))"
>
63% of the 'best possible' speedup isn't too bad I'd say :-) Seriously
though.
Note that SeqBuilder isn't really the event-driven architecture that
would allow an event handler full control over what gets parsed and
what doesn't. Rather, it is up to the parser how much advantage it
wants to take from the interface (by aborting/skipping certain sections
of an entry).
Note also that obviously none of these things will be as fast as the
pure perl short-cut - you still have to create objects and go through
quite a few more control structures.
-hilmar
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the Bioperl-l
mailing list