[Bioperl-l] need help with large genbank file
Chris Dagdigian
dag@sonsorol.org
Tue, 23 Jul 2002 19:00:22 -0400
Dinakar,
The file is to big for perl to open a filehandle on (at least that is
what your error message states)
I know from painful experience :) that the file you are trying to read
is larger than 2GB when it is uncompressed into its native form. If
your computer, filesystem, kernel or operating system cannot handle
files larger than 2GB in size then you will get these sorts of errors.
There are various tricks to make things work. Systems with 64-bit
architectures (like Alphaservers) do not have these problems at all.
Linux solved this in the kernel a long time ago and the common linux
filesystems can all handle large files. There are however binary
programs that you may run into like 'cat', 'more', 'uncompress' etc.
etc. that will coredump or segfault on large files because they were not
compiled to support 64-bit offsets.
Without knowing your operating system or local configuration I'd
recommend that you experiment with breaking NT into several smaller
pieces. You should be able to determine experimentally the filesize
limit that you appear to have.
-Chris
Dinakar Desai wrote:
> Hello:
>
> I am new to perl and bioperl. I have downloaded file from ncbi
> (ftp://ftp.ncbi.nih.gov/blast/db/nt) and this file is quite large. I am
> trying to parse this file for certain pattern with Bioperl. I get
> error.I have looked into largefasta.pm and they suggest not to use it.
> I would appreciate, if you could help me with this problem.
>
> My code to test only 5 records out of this big file is as follows:
> <code>
> #!/usr/bin/env perl
>
> use lib '/home/desas2/perl_mod/lib/site_perl/5.6.0/';
>
> use Bio::SeqIO;
>
> $seqio = Bio::SeqIO->new( -file =>"/home/desas2/data/nt", '-format' =>
> 'Fasta');
>
> $seqobj = $seqio->next_seq();
> $count = 5;
> while ($count > 0){
> print $seqobj->seq();
> $seqobj = $seqio->next_seq();
>
> }
> </code>
> and the error message is:
> <error>
> ------------ EXCEPTION -------------
> MSG: Could not open /home/desas2/data/nt for reading: File too large
> STACK Bio::Root::IO::_initialize_io
> /home/desas2/perl_mod/lib/site_perl/5.6.0//B
> io/Root/IO.pm:244
> STACK Bio::SeqIO::_initialize
> /home/desas2/perl_mod/lib/site_perl/5.6.0//Bio/Seq
> IO.pm:381
> STACK Bio::SeqIO::new
> /home/desas2/perl_mod/lib/site_perl/5.6.0//Bio/SeqIO.pm:31
> 4
> STACK Bio::SeqIO::new
> /home/desas2/perl_mod/lib/site_perl/5.6.0//Bio/SeqIO.pm:32
> 7
> STACK toplevel ./test_fasta.pl:8
>
> --------------------------------------
> </error>
>
> Do you have any suggestion, how I could get to read this big file and
> get sequence object. I know how to manipulate sequence object.
>
> Thank you.
>
> Dinakar
>
--
Chris Dagdigian, <dag@sonsorol.org>
Independent life science IT & research computing consulting
Office: 617-666-6454, Mobile: 617-877-5498, Fax: 425-699-0193
Work: http://BioTeam.net PGP KeyID: 83D4310E Yahoo IM: craffi