[Bioperl-l] Trouble with Bio::DB::Fasta and large files
Tyler Alioto
tsw@uclink4.berkeley.edu
Wed, 6 Nov 2002 09:45:02 -0800
Thanks for the help. I think Mummi will end up being correct. I have a
lot on my table right now and won't get to this for a day or so, but
will post the solution once I figure it out. FYI, I'm running RedHat7.2
(kernal 2.4.7) on a Dell Precision 340 with 160GB HD space and 1.5GB
RAM. Other non-perl operations on the files (gunzip, formatdb,
blastall) work fine, it's just seeking into it with bioperl that fails.
-Tyler
On Wednesday, November 6, 2002, at 01:48 AM, Gudmundur A. Thorisson
wrote:
> I would like to add to this discussion by saying that any reasonably
> recent Linux kernel (if Linux is the 2Gb-deficient platform in this
> case), starting with 2.4 I think, can handle >2Gb files, up to some
> 4Tb or whatever the next level is. I have not had any problems like
> this with the kernel itself since RedHat 7 or so.
>
> The problem in our case, as Lincoln describes, was solved just by
> recompiling Perl. But in a previous setting where I needed to do a
> similar thing (2.2 series kernel, recompiled for large file support),
> just a gzip-on-the-fly and piping into a reformatting script, so Perl
> never saw a >2Gb file. In that case, I needed to do the same thing as
> Allen just mentioned, i.e recompile the (Bash) shall and some file
> utilities and related GNU-stuff. That took care of that problem.
>
>
> Mummi, CSHL
>
> Allen Day wrote:
>
>> Tyler,
>>
>> Are you perchance using tcsh? It could simply be a problem with your
>> shell. This came up on the bioclusters mailing list a while ago:
>>
>> http://bioinformatics.org/pipermail/bioclusters/2002-May/000220.html
>>
>> I ran into the problem last week when it appeared gzip wouldn't work
>> for me when trying to load a big (human) file into Bio::DB::GFF.
>> Recompiled the shell and it was fine.
>>
>> -Allen
>>
>>
>>
>> On Tue, 5 Nov 2002, Lincoln Stein wrote:
>>
>>
>>> I believe you are hitting the 2 GB file limit on some Unix systems.
>>> In general, you will have to do three things:
>>>
>>> 1) make sure that your kernel supports large files > 2 Gb
>>> Recompile the kernel if not.
>>>
>>> 2) make sure that you have a recent version of the C library,
>>> libc, that supports large files. Install a new one if not (good
>>> luck!)
>>>
>>> 3) make sure that you have a version of Perl that was compiled
>>> with large file support. Recompile with large file support turned
>>> on if not.
>>>
>>> It's a big pain. We just had to do this for one of our servers when
>>> we experienced a similar problem.
>>>
>>> Lincoln
>>>
>>> On Tuesday 05 November 2002 07:32 pm, Tyler wrote:
>>>
>>>> I have been using Bio::DB::Fasta to extract sequences from fasta
>>>> BLAST
>>>> databases for zebrafish and fugu with no problems. I've used both
>>>> the
>>>> tied hash and object oriented implementations and they work great
>>>> with
>>>> these databases. Thanks Lincoln.
>>>>
>>>> However, when trying to use Bio::DB::Fasta on local mouse or human
>>>> genome databases (ensembl raw data) they throw the "Invalid file or
>>>> dirname" exception. The mouse fasta file is 2.7GB and the human one
>>>> is
>>>> 3.2GB, as opposed to 1.2GB for zebrafish and 340MB for fugu. All
>>>> scripts are the same except for the name of the database file. All
>>>> databases work fine with standalone blast (both the web interface
>>>> and
>>>> the bioperl interface).
>>>>
>>>> Is there a work around for dealing with these extremely large files?
>>>>
>>>> -Tyler
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l@bioperl.org
>>>> http://bioperl.org/mailman/listinfo/bioperl-l
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l@bioperl.org
>>> http://bioperl.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l@bioperl.org
>> http://bioperl.org/mailman/listinfo/bioperl-l
>>
>>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
>
*******************************************
TYLER ALIOTO
PhD. Candidate
Department of Molecular & Cell Biology
265 LSA, UC Berkeley
Berkeley, CA 94720
lab (510) 642-9887 fax (240) 525-4809
tyler@stanfordalumni.org
http://www.ocf.berkeley.edu/~tsw