[Bioperl-l] extracting CDS portion of RefSeqs

Wed Dec 21 12:59:34 EST 2005

Hilmar,

re: "only genbank.pm supports the SeqBuilder interface" 

Ah, that explains why I saw no speedup when I compared reading EMBL with
and without running 

$builder->want_none();
$builder->add_wanted_slot('display_id','features','seq')

(and also why I noticed that my feature processing sub worked even if I
didn't add_wanted_slot on 'features' !!)

FYI - using the $builder as above to read 46 GenBank mRNA RefSeq
containing lots of REFERENCE data gave me ~ 33% speed up
HOWEVER, I get %52 speed up if instead I pre-filtered the genbank
flatfile using:
	perl -n -e "print if (m'^(LOCUS|ACCESSION)' ||
(m'^FEATURES'...m'^//'))"

-- Malcolm

>-----Original Message-----
>From: Hilmar Lapp [mailto:hlapp at gmx.net] 
>Sent: Friday, December 16, 2005 11:55 AM
>To: Cook, Malcolm
>Cc: bioperl-l at portal.open-bio.org; Amit Indap
>Subject: Re: [Bioperl-l] extracting CDS portion of RefSeqs
>
>
>On Dec 15, 2005, at 11:06 AM, Cook, Malcolm wrote:
>
>> Regarding performance, I've never tried it, but you might look at
>> http://doc.bioperl.org/bioperl-live/Bio/Seq/SeqBuilder.html, which 
>> shows
>> you how to tell SeqIO that you only need to read sequence 
>and features.
>>
>
>BTW right now only genbank.pm supports the SeqBuilder interface. If 
>anyone of those people who posted recently that they'd like to 
>volunteer read this, this would be a nice opportunity to take a fully 
>working implementation as an example and transfer it to other 
>applicable parsers, e.g. embl and swiss.
>
>	-hilmar
>-- 
>-------------------------------------------------------------
>Hilmar Lapp                            email: lapp at gnf.org
>GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
>-------------------------------------------------------------
>
>
>