[Bioperl-l] Processing large fasta sequences throught SeqIO
Jason Stajich
jason@chg.mc.duke.edu
Thu, 30 Aug 2001 16:00:04 -0400 (EDT)
On Thu, 30 Aug 2001, Josep Francesc Abril Ferrando wrote:
> I need to work with chromosome size fasta sequences and I was trying
> to run some perl code using BioPerl version 0.7 ("$Id:largefasta.pm,v
> 1.5.2.1$", which is the one currently installed in our system). As I
> read in the "Bio::SeqIO::largefasta" documentation that this module
> has to be accessed from "Bio:SeqIO",I do not included directly that
> module in the program. I wrote a script that basically reads the whole
> seq, may process a little bit the sequence (i.e. reformating
> non-uniform length sequence lines -if I am building the input by
> joining many sequences under the same id-), and then save the
> processed large sequence. It seems to work OK, but I got some strange
> results in the saved file while I get the following error/warning:
>
> Error in tempdir() using /tmp/XXXXXXXXXX: Could not create directory
> /tmp/Z0gD8R0rlB: Too many links at
> /usr/lib/perl5/site_perl/5.005//Bio/Root/IO.pm line 457
>
Is your tmp dir really full of files/directories or have not enough space
for the collection of all the sequence data? This seems like a system
problem.
Do you have File::Temp installed? There is a known bug in 0.7 release
that if you do not have File::Temp installed the application will not
cleanup its tempdirs/tempfiles cleanly. Installing File::Temp will take
care of that.
> If I look at the saved file, the sequence is OK (do not have more or
> less nucleotides than expected and they are in the correct ordering)
> but the file contains a lot of empty lines (or just having '>') after
> the finished sequence. Any idea of what should be wrong in the
> following script:
>
Nothing obvious is jumping out right now by looking at your code -
How large are your files?
> ---->8---->8---->8---->8---->8----
>
> perl -ne 'BEGIN{ print ">bigseq\n"; }
> $_ !~ /^>|^\s*$/o && print ; ' $INDIR/*.fa |
> perl -e '
> use Bio::Seq;
> use Bio::SeqIO;
> my $seqin = Bio::SeqIO->new(-format => "largefasta", -fh => \*STDIN );
> my $seqout = Bio::SeqIO->new(-format => "largefasta", -fh => \*STDOUT);
> while (my $sequence = $seqin->next_seq()) {
> # do here some checkings/changes on substrings of the sequence
> $seqout->write_seq($sequence);
> }; # while
> exit(0);
> ' - > $OUTDIR/bigseq.fa
>
> ----8<----8<----8<----8<----8<----
>
> Is that the right way to use "Bio::SeqIO" for processing large fasta
> files. Do I have to include "Bio::Seq::LargeSeq" and, if yes, how can
> I do that ?
>
you could add the line
use Bio::Seq::LargeSeq;
just below --> use Bio::SeqIO <--
if you wanted, but it is included by the largefasta modules so it is
optional.
> Thanks for your attention... Josep F.
> ________________________________________
>
> Josep Francesc ABRIL FERRANDO
>
> RESEARCH GROUP on BIOMEDICAL INFORMATICS
> GENOME INFORMATICS LAB
> IMIM - UPF
> C/ Dr. Aiguader 80
> 08003 - Barcelona (SPAIN)
>
> Ph: +34 93 2211009 ext 2016
> Fax: +34 93 2213237
>
> http://www1.imim.es/~jabril/
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>