[Bioperl-l] Bioperl-l Digest, Vol 91, Issue 20

Dimitar Kenanov dimitark at bii.a-star.edu.sg
Tue Nov 30 01:39:07 UTC 2010


On 11/30/2010 01:00 AM, bioperl-l-request at lists.open-bio.org wrote:
> Send Bioperl-l mailing list submissions to
> 	bioperl-l at lists.open-bio.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://lists.open-bio.org/mailman/listinfo/bioperl-l
> or, via email, send a message with subject or body 'help' to
> 	bioperl-l-request at lists.open-bio.org
>
> You can reach the person managing the list at
> 	bioperl-l-owner at lists.open-bio.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Bioperl-l digest..."
>
>
> Today's Topics:
>
>     1.  genbank (Dimitar Kenanov)
>     2. Re:  genbank (Chris Fields)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 29 Nov 2010 17:35:26 +0800
> From: Dimitar Kenanov<dimitark at bii.a-star.edu.sg>
> Subject: [Bioperl-l] genbank
> To: "'bioperl-l at bioperl.org'"<bioperl-l at bioperl.org>
> Message-ID:<4CF373DE.4070902 at bii.a-star.edu.sg>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Hi again,
> it seems that when i download (with 'download_query_genbank.pl') the
> whole proteome from NCBI in fasta format it is first being downloaded
> and from it is being created some kind of SeqFastaSpeedFactory and after
> that from it is being copied to the output file. But i want to download
> and write to output file one by one so i can see the download
> progress(which is working for genbank data).
>
> Its frustrating :)
>
> Any ideas where to look for solution
> Cheers
> Dimitar
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 29 Nov 2010 08:39:16 -0600
> From: Chris Fields<cjfields at illinois.edu>
> Subject: Re: [Bioperl-l] genbank
> To: Dimitar Kenanov<dimitark at bii.a-star.edu.sg>
> Cc: "'bioperl-l at bioperl.org'"<bioperl-l at bioperl.org>
> Message-ID:<F3C80557-52DE-4D60-9E72-0660031D8F46 at illinois.edu>
> Content-Type: text/plain; charset=us-ascii
>
> On Nov 29, 2010, at 3:35 AM, Dimitar Kenanov wrote:
>
>    
>> Hi again,
>> it seems that when i download (with 'download_query_genbank.pl') the whole proteome from NCBI in fasta format it is first being downloaded and from it is being created some kind of SeqFastaSpeedFactory and after that from it is being copied to the output file. But i want to download and write to output file one by one so i can see the download progress(which is working for genbank data).
>>
>> Its frustrating :)
>>
>> Any ideas where to look for solution
>> Cheers
>> Dimitar
>>      
> You can't do this with the default script, but you can use a modified version and, where you are retrieving a sequence stream, in the last four lines:
>
> my $stream = $dbh->get_Stream_by_query($query);
> while( my $seq = $stream->next_seq ) {
> 	$out->write_seq($seq);
> }
>
> insert an iterator in the loop that indicates progress.  Realize the sequence data is processed through Bio::SeqIO, so it won't be exactly the same as what is retrieved from GenBank, but it should be very close.
>
> If you want raw sequence, you can use Bio::DB::EUtilities, but it's a bit more complicated.
>
> chris
>
>
> ------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> End of Bioperl-l Digest, Vol 91, Issue 20
> *****************************************
>
>    

Hi,
thank you for the info.
I already have inserted a progress bar(Term::ProgressBar) in the last 
four lines. The problem is that i see the progress at the end. I see 
directly 100%done. See the attached script.
What i was reading in the modules underlying the script the way the 
stream is constructed it should be able to be read from while is being 
downloaded. But when i get fasta seqs with NCBI rettype=fasta it is not 
possible.

-- 
Dimitar Kenanov
Post doctoral fellow
Bioinformatics Institute
A*STAR Singapore
tel: +65 6478 8514
email: dimitark at bii.a-star.edu.sg

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: download_query_genbank.pl
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20101130/488f2bf6/attachment-0004.pl>


More information about the Bioperl-l mailing list