[Bioperl-l] Processing large fasta sequences throught SeqIO
Josep Francesc Abril Ferrando
jabril@imim.es
Thu, 30 Aug 2001 20:05:06 +0200
I need to work with chromosome size fasta sequences and I was trying to run some perl code using
BioPerl version 0.7 ("$Id:largefasta.pm,v 1.5.2.1$", which is the one currently installed in our
system). As I read in the "Bio::SeqIO::largefasta" documentation that this module has to be accessed
from "Bio:SeqIO",I do not included directly that module in the program. I wrote a script that
basically reads the whole seq, may process a little bit the sequence (i.e. reformating non-uniform
length sequence lines -if I am building the input by joining many sequences under the same id-), and
then save the processed large sequence. It seems to work OK, but I got some strange results in the
saved file while I get the following error/warning:
Error in tempdir() using /tmp/XXXXXXXXXX: Could not create directory /tmp/Z0gD8R0rlB: Too many links
at /usr/lib/perl5/site_perl/5.005//Bio/Root/IO.pm line 457
If I look at the saved file, the sequence is OK (do not have more or less nucleotides than expected
and they are in the correct ordering) but the file contains a lot of empty lines (or just having
'>') after the finished sequence. Any idea of what should be wrong in the following script:
---->8---->8---->8---->8---->8----
perl -ne 'BEGIN{ print ">bigseq\n"; }
$_ !~ /^>|^\s*$/o && print ; ' $INDIR/*.fa |
perl -e '
use Bio::Seq;
use Bio::SeqIO;
my $seqin = Bio::SeqIO->new(-format => "largefasta", -fh => \*STDIN );
my $seqout = Bio::SeqIO->new(-format => "largefasta", -fh => \*STDOUT);
while (my $sequence = $seqin->next_seq()) {
# do here some checkings/changes on substrings of the sequence
$seqout->write_seq($sequence);
}; # while
exit(0);
' - > $OUTDIR/bigseq.fa
----8<----8<----8<----8<----8<----
Is that the right way to use "Bio::SeqIO" for processing large fasta files. Do I have to include
"Bio::Seq::LargeSeq" and, if yes, how can I do that ?
Thanks for your attention... Josep F.
________________________________________
Josep Francesc ABRIL FERRANDO
RESEARCH GROUP on BIOMEDICAL INFORMATICS
GENOME INFORMATICS LAB
IMIM - UPF
C/ Dr. Aiguader 80
08003 - Barcelona (SPAIN)
Ph: +34 93 2211009 ext 2016
Fax: +34 93 2213237
http://www1.imim.es/~jabril/