[Bioperl-l] extracting subsequences

Scott Cain cain at cshl.edu
Tue Oct 25 23:06:48 EDT 2005


Amit,

Look at Bio::DB::Fasta.  It builds a BerkeleyDB of the fasta files and
results in significantly faster substringing.

Scott


On Tue, 2005-10-25 at 16:48 -0400, Amit Indap wrote:
> Hi,
> 
> I have to extract subsequences from fasta files containing entire
> human chromosomes. For example I would like to extract bp
> 167506667..167523040. I know how to do this using the Bio::Seq and
> Bio::SeqIO APIs. The problem is it takes a long time to read in an
> entire fasta file containing a chromosome. Is there a way I can speed
> this up?
> 
> The bp indices are taken from BLAT-ing my sequences to the genome. I
> could use megablast to find which contigs my sequences lie on, and
> then read in those files rather than the whole chromosome.
> 
> Any suggestions would be helpful. Thanks.
> 
> Amit
> --
> Amit Indap
> http://www.bscb.cornell.edu/Homepages/Amit_Indap/
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



More information about the Bioperl-l mailing list