[Bioperl-l] extracting CDS portion of RefSeqs
Cook, Malcolm
MEC at stowers-institute.org
Wed Dec 21 12:59:34 EST 2005
Hilmar,
re: "only genbank.pm supports the SeqBuilder interface"
Ah, that explains why I saw no speedup when I compared reading EMBL with
and without running
$builder->want_none();
$builder->add_wanted_slot('display_id','features','seq')
(and also why I noticed that my feature processing sub worked even if I
didn't add_wanted_slot on 'features' !!)
FYI - using the $builder as above to read 46 GenBank mRNA RefSeq
containing lots of REFERENCE data gave me ~ 33% speed up
HOWEVER, I get %52 speed up if instead I pre-filtered the genbank
flatfile using:
perl -n -e "print if (m'^(LOCUS|ACCESSION)' ||
(m'^FEATURES'...m'^//'))"
-- Malcolm
>-----Original Message-----
>From: Hilmar Lapp [mailto:hlapp at gmx.net]
>Sent: Friday, December 16, 2005 11:55 AM
>To: Cook, Malcolm
>Cc: bioperl-l at portal.open-bio.org; Amit Indap
>Subject: Re: [Bioperl-l] extracting CDS portion of RefSeqs
>
>
>On Dec 15, 2005, at 11:06 AM, Cook, Malcolm wrote:
>
>> Regarding performance, I've never tried it, but you might look at
>> http://doc.bioperl.org/bioperl-live/Bio/Seq/SeqBuilder.html, which
>> shows
>> you how to tell SeqIO that you only need to read sequence
>and features.
>>
>
>BTW right now only genbank.pm supports the SeqBuilder interface. If
>anyone of those people who posted recently that they'd like to
>volunteer read this, this would be a nice opportunity to take a fully
>working implementation as an example and transfer it to other
>applicable parsers, e.g. embl and swiss.
>
> -hilmar
>--
>-------------------------------------------------------------
>Hilmar Lapp email: lapp at gnf.org
>GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
>-------------------------------------------------------------
>
>
>
More information about the Bioperl-l
mailing list