[Bioperl-l] Processing large fasta sequences throught SeqIO
Jason Stajich
jason@chg.mc.duke.edu
Fri, 31 Aug 2001 11:31:22 -0400 (EDT)
On Fri, 31 Aug 2001, Josep Francesc Abril Ferrando wrote:
> Hi Jason,
>
> > > Error in tempdir() using /tmp/XXXXXXXXXX: Could not create directory
> > > /tmp/Z0gD8R0rlB: Too many links at
> > > /usr/lib/perl5/site_perl/5.005//Bio/Root/IO.pm line 457
> >
> > Is your tmp dir really full of files/directories or have not enough space
> > for the collection of all the sequence data? This seems like a system
> > problem.
>
> Currently, "/tmp" is only ~150Mb and I have more than 1Gb of free hard
> disk space (on a PC box with 386Mb of RAM, Red Hat 6.2 with kernel
> version 2.2.14, and perl 5.6.1). Maybe it could be a permissions
> issue.
>
Seems strange, again.
Will cook up a testing script for you in a minute. Can you at least do
% mkdir /tmp/me
% echo "I am great" > /tmp/me.txt
% rm -rf /tmp/me /tmp/me.txt
> > Do you have File::Temp installed? There is a known bug in 0.7 release
> > that if you do not have File::Temp installed the application will not
> > cleanup its tempdirs/tempfiles cleanly. Installing File::Temp will take
> > care of that.
>
> It is installed and it is version 0.12. Do I have to include the
> corresponding "use File::Temp;" in the script ? Maybe I have to tell
> our sysadmin to update both, File::Temp and BioPerl.
>
Nope, don't need to include it, it is done for you in Bio::Root::IO.
We have tried to make it as simple as possible to use the modules, and
I've never had the problems you can describe. 0.12 is fine for sure.
I have access to a RH box so I'll see if I can duplicate any of the
problems.
> > > If I look at the saved file, the sequence is OK (do not have more or
> > > less nucleotides than expected and they are in the correct ordering)
> > > but the file contains a lot of empty lines (or just having '>') after
> > > the finished sequence. Any idea of what should be wrong in the
> > > following script:
> >
> > Nothing obvious is jumping out right now by looking at your code -
> > How large are your files?
>
> At this moment I am working around 50Mbp length sequences, but I would
> like being able to scale up to 250Mbp.
>
> > > Is that the right way to use "Bio::SeqIO" for processing large fasta
> > > files. Do I have to include "Bio::Seq::LargeSeq" and, if yes, how can
> > > I do that ?
> >
> > you could add the line
> > use Bio::Seq::LargeSeq;
> > just below --> use Bio::SeqIO <--
> > if you wanted, but it is included by the largefasta modules so it is
> > optional.
>
> Well, I've made some test, including "use Bio::Seq::LargeSeq" first
> and then also with "use File::Temp", and I've got the same results
> (the same error/warning -only changing the temporary directory name
> that cannot be created- and the same trailing extra lines).
>
> Thanks again... Josep F.
>
> ________________________________________
>
> Josep Francesc ABRIL FERRANDO
>
> RESEARCH GROUP on BIOMEDICAL INFORMATICS
> GENOME INFORMATICS LAB
> IMIM - UPF
> C/ Dr. Aiguader 80
> 08003 - Barcelona (SPAIN)
>
> Ph: +34 93 2211009 ext 2016
> Fax: +34 93 2213237
>
> http://www1.imim.es/~jabril/
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>