[Bioperl-l] improve speed in extracting Fasta sequence
Siaw Ling Lo
siawlinglo at yahoo.com
Mon Dec 27 20:50:44 EST 2004
hi,
I am new to bioperl and I need to extract fasta
sequences from Uniprot using a list of accession
number in a file. The response time is very slow (60
sequences extracted in an hour) as
the list of accession number is in thousands. Is
there
a way to improve the speed?
The following is the code:
=======================================
use Bio::SeqIO;
my $file = 'uniprot';
my $format = 'Fasta';
#read in accession no input file
open (ACC, "acc.txt") or die "an error occured with
reading acc file: $!";
#loop thru the input file and write to output file
while(<ACC>)
{
chomp; # remove newline
$accs[$x] = $_;
$x++ ;
}
$count = @accs;
#open write out file - Fasta sequence file
open(FILEHANDLE, ">uniprot_fasta.txt") or die
"cannot
open out file for writing: $!";
my $inseq = Bio::SeqIO->new('-file' => "<$file",
'-format' => $format );
# get sequence
while (my $seq = $inseq->next_seq) {
#search for the acc in the fasta file and extract it
for ($i=0; $i<$count; $i++){
#strip off all trailing white spaces - tabs, spaces,
new lines and returns
$accs[$i] =~ s/\s+$//;
#if match, print out the line
if ($seq->desc() =~ /$accs[$i]/) {
print FILEHANDLE ">";
print FILEHANDLE $seq->desc(),"\n";
print FILEHANDLE $seq->seq,"\n";
#break out of loop when found
last;
}
}
}
exit;
Any advice is much appreciated.
Thank you,
Siaw Ling
__________________________________
Do you Yahoo!?
Send holiday email and support a worthy cause. Do good.
http://celebrity.mail.yahoo.com
More information about the Bioperl-l
mailing list