[Bioperl-l] what's the optimal way to search a fasta file formatching ID's?

Jason Stajich jason at bioperl.org
Fri Oct 26 04:57:52 UTC 2007


or see Bio::DB::Fasta if you want a bioperl soln.

On Oct 25, 2007, at 6:17 PM, Cook, Malcolm wrote:

> If you have the fasta database already indexed for blast searching,  
> then
> you should use fastacmd, which comes with the blast package, for
> extracting (sub)sequences based on ID (and indices).
>
> Malcolm Cook
> Database Applications Manager - Bioinformatics
> Stowers Institute for Medical Research - Kansas City, Missouri
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Joseph  
>> Fass
>> Sent: Thursday, October 25, 2007 4:50 PM
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] what's the optimal way to search a fasta
>> file formatching ID's?
>>
>> I would appreciate any advice, big or small, on this ...
>>
>> I've got a decent-sized database ... 90,000 sequences or so
>> in a single fasta-format file.  Then, I've got sequence ID's
>> from that database that show up in blast reports.  I want to
>> collect those ID's and their sequences (for the purposes of
>> exploring possible contigs).  Since the blast report only
>> includes sub-sequences (from alignments) of my sequences, I
>> want to parse the report, then match each hit ID against an
>> ID in the database, so I can pull out its full sequence.  Is
>> there a faster way to do this than opening the database file
>> each time I have a new hit ID, so I can search it from
>> beginning to end?  If I push each sequence onto a list or
>> hash, it's liable to chew up a lot of RAM, I'm guessing.  Any
>> suggestions?
>>
>> Thanks in advance,
>> ~joe
>>
>> --
>> Joseph Fass
>> joseph.fass at gmail.com  ||  joefass at hotmail.com
>> 970.227.5928 (c)  ||  530.754.7978 (w)
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org




More information about the Bioperl-l mailing list