[Bioperl-l] Trouble with Bio::DB::Fasta and large files

Tyler Alioto tsw@uclink4.berkeley.edu
Wed, 6 Nov 2002 09:45:02 -0800


Thanks for the help. I think Mummi will end up being correct. I have a 
lot on my table right now and won't get to this for a day or so, but 
will post the solution once I figure it out. FYI, I'm running RedHat7.2 
(kernal 2.4.7) on a Dell Precision 340 with 160GB HD space and 1.5GB 
RAM. Other non-perl operations on the files (gunzip, formatdb, 
blastall) work fine, it's just seeking into it with bioperl that fails.

-Tyler

On Wednesday, November 6, 2002, at 01:48 AM, Gudmundur A. Thorisson 
wrote:

>  I would like to add to this discussion by saying that any reasonably 
> recent Linux kernel (if Linux is the 2Gb-deficient platform in this 
> case), starting with 2.4 I think, can handle >2Gb files, up to some 
> 4Tb or whatever the next level is. I have not had any problems like 
> this with the kernel itself since RedHat 7 or so.
>
> The problem in our case, as Lincoln describes, was solved just by 
> recompiling Perl. But in a previous setting where I needed to do a 
> similar thing (2.2 series kernel, recompiled for large file support), 
> just a gzip-on-the-fly and piping into a reformatting script, so Perl 
> never saw a >2Gb file. In that case, I needed to do the same thing as 
> Allen just mentioned, i.e recompile the (Bash) shall and some file 
> utilities and related GNU-stuff. That took care of that problem.
>
>
> Mummi, CSHL
>
> Allen Day wrote:
>
>> Tyler,
>>
>> Are you perchance using tcsh?  It could simply be a problem with your 
>> shell.  This came up on the bioclusters mailing list a while ago:
>>
>> http://bioinformatics.org/pipermail/bioclusters/2002-May/000220.html
>>
>> I ran into the problem last week when it appeared gzip wouldn't work 
>> for me when trying to load a big (human) file into Bio::DB::GFF.  
>> Recompiled the shell and it was fine.
>>
>> -Allen
>>
>>
>>
>> On Tue, 5 Nov 2002, Lincoln Stein wrote:
>>
>>
>>> I believe you are hitting the 2 GB file limit on some Unix systems.  
>>> In general, you will have to do three things:
>>>
>>> 	1) make sure that your kernel supports large files > 2 Gb
>>> 	Recompile the kernel if not.
>>>
>>> 	2) make sure that you have a recent version of the C library,
>>> 	libc, that supports large files.  Install a new one if not (good 
>>> luck!)
>>>
>>> 	3) make sure that you have a version of Perl that was compiled
>>> 	with large file support.  Recompile with large file support turned
>>> 	on if not.
>>>
>>> It's a big pain.  We just had to do this for one of our servers when 
>>> we experienced a similar problem.
>>>
>>> Lincoln
>>>
>>> On Tuesday 05 November 2002 07:32 pm, Tyler wrote:
>>>
>>>> I have been using Bio::DB::Fasta to extract sequences from fasta 
>>>> BLAST
>>>> databases for zebrafish and fugu with no problems. I've used both 
>>>> the
>>>> tied hash and object oriented implementations and they work great 
>>>> with
>>>> these databases. Thanks Lincoln.
>>>>
>>>> However, when trying to use Bio::DB::Fasta on local mouse or human
>>>> genome databases (ensembl raw data) they throw the "Invalid file or
>>>> dirname" exception. The mouse fasta file is 2.7GB and the human one 
>>>> is
>>>> 3.2GB, as opposed to 1.2GB for zebrafish and 340MB for fugu. All
>>>> scripts are the same except for the name of the database file. All
>>>> databases work fine with standalone blast (both the web interface 
>>>> and
>>>> the bioperl interface).
>>>>
>>>> Is there a work around for dealing with these extremely large files?
>>>>
>>>> -Tyler
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l@bioperl.org
>>>> http://bioperl.org/mailman/listinfo/bioperl-l
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@bioperl.org
>>> http://bioperl.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@bioperl.org
>> http://bioperl.org/mailman/listinfo/bioperl-l
>>
>>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
>
*******************************************
TYLER ALIOTO
PhD. Candidate
Department of Molecular & Cell Biology
265 LSA, UC Berkeley
Berkeley, CA 94720

lab (510) 642-9887 fax (240) 525-4809
tyler@stanfordalumni.org
http://www.ocf.berkeley.edu/~tsw