[Bioperl-l] genbank

Mon Nov 29 14:39:16 UTC 2010

On Nov 29, 2010, at 3:35 AM, Dimitar Kenanov wrote:

> Hi again,
> it seems that when i download (with 'download_query_genbank.pl') the whole proteome from NCBI in fasta format it is first being downloaded and from it is being created some kind of SeqFastaSpeedFactory and after that from it is being copied to the output file. But i want to download and write to output file one by one so i can see the download progress(which is working for genbank data).
> 
> Its frustrating :)
> 
> Any ideas where to look for solution
> Cheers
> Dimitar

You can't do this with the default script, but you can use a modified version and, where you are retrieving a sequence stream, in the last four lines:

my $stream = $dbh->get_Stream_by_query($query);
while( my $seq = $stream->next_seq ) {
	$out->write_seq($seq);
}

insert an iterator in the loop that indicates progress.  Realize the sequence data is processed through Bio::SeqIO, so it won't be exactly the same as what is retrieved from GenBank, but it should be very close.

If you want raw sequence, you can use Bio::DB::EUtilities, but it's a bit more complicated.

chris