[Bioperl-l] Limitations for Bio::DB::Fasta
Fields, Christopher J
cjfields at illinois.edu
Sun Jul 13 15:22:46 UTC 2014
Which file? It’s something we could probably check. My feeling is it is one or more of:
1) Your version of perl doesn’t support large files efficiently (unlikely unless you are using old versions of perl). But this should fail I think
2) DB_File itself isn’t very efficient if you have tons of sequences (millions). Is that the case?
3) IO is ‘inefficient’, in other words you are running this on a non-optimal system where disk is a limiting factor.
Hard to say w/o testing it directly.
There are alternatives just to note (samtools faidx comes to mind).
chris
On Jul 7, 2014, at 1:12 PM, Ki Baik <hkbaik at gmail.com> wrote:
> I'm trying to index a large fasta file that I downloaded from NCBI's ftp site. The size of the fasta file is 700GB. I'm trying to use Bio::DB::Fasta to index this file. When the index file hits around 10GB, it seems to hang. I'm wondering if there is a limit on the fasta file size it can index.
>
> Also, how does Bio::DB::Fasta compare to Bio::Index::Fasta? Is one better for large fasta files? Are there any other indexing schemes I can use instead of these modules? Any information would be appreciated.
>
> Thanks,
>
> KB
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list