[Bioperl-l] need help with large genbank file
simon andrews (BI)
simon.andrews@bbsrc.ac.uk
Wed, 24 Jul 2002 08:43:02 +0100
> > Dinakar Desai wrote:
> >> and the error message is:
> >> <error>
> >> ------------ EXCEPTION -------------
> >> MSG: Could not open /home/desas2/data/nt for reading:
> >> File too large
> Chris Dagdigian wrote:
> >
> > Dinakar,
> >
> > The file is to big for perl to open a filehandle on (at
> > least that is what your error message states)
> >
> > Without knowing your operating system or local
> > configuration I'd recommend that you experiment with
> > breaking NT into several smaller pieces.
> Dinakar Desai wrote:
>
> Thank you very much for your email. I am running this
> script on : Linux 2.4.7-10
>
> Can you suggest how I can break this file into smaller
> files and then parse them.
Dinakar,
You seemed to sugges before that your file contained lots of small sequence files rather than a few large ones. In this case there may be a quick fix.
Since you seem to be running a pretty recent kernel you will hopefully find that your system commands (eg cat) can cope with >2Gb files. If not then try upgrading your textutils package (kernel 2.4.9 and textutils 2.0.11-7 definitely works with >2Gb).
If you can use cat on your large file then simply create a script which reads its input from STDIN and then pipe the results of cat to it. We have done this successfully in the past to process large files.
eg (untested):
-----------------------------------------------------
#!/usr/bin/perl -w
use strict;
use Bio::SeqIO;
my $stream = $Bio::SeqIO -> new(-fh => \*STDIN,
-format => 'fasta');
while (my $seqobj = $stream -> next_seq()) {
# Do Something
}
-----------------------------------------------------
Then run with:
cat your_big_file | the_perl_script.pl
Hope this helps
Simon.