[Bioperl-l] Batch retrieval of seq from swiss-prot

Wed Oct 12 14:27:11 EDT 2005

Well I'm sure this isn't the major cause of your slowness, but you  
are re-initializing the db handle in your loop each time,

Code like this
#!/usr/bin/perl -w
use strict;
use Bio::DB::SwissProt;
my $database= new Bio::DB::SwissProt;
open(SEQIDS, 'sample1.eco')  || die "$!";
while(<SEQIDS>) {
   my $seqid = $_;
   chomp($seqid);
   my $seq = $database->get_Seq_by_acc($seqid);
   print $seq->seq(), "\n\n";
}

You can also switch the swissprot provider to a local mirror.
$database->hostlocation('australia')
if you are in australia for example.

You have to read the code to see what are the available mirrors for now:
here is what I've defined in the module, there may be more mirrors  
now, I'm not sure:
  hosts'   =>
                {
                    'switzerland'  => 'ch.expasy.org',
                    'canada' => 'ca.expasy.org',
                    'china'  => 'cn.expasy.org',
                    'taiwan' => 'tw.expasy.org',
                    'australia' => 'au.expasy.org',
                    'korea'  => 'kr.expasy.org',
                    'us'     => 'us.expasy.org',
                },

You can see the module's code here:
perldoc -m Bio::DB::Swissprot

The real problem is that the swissprot web interface doesn't support  
a batch mode - this is the data provider limitation and bioperl has  
control over this.

Your options are (if you want fully annotated proteins and not just a  
fasta file):
a) download swissprot and use Bio::Index::Swissprot to index it  
locally and get superfast access
b) use Bio::DB::GenPept and get back NCBI-ized Swissprot records  
(batch mode does work for NBCI)

if you just want fasta either use Bio::DB::GenPept or download the  
swissprot db from NCBI and index it with Bio::Index::Fasta or  
Bio::DB::Fasta.

Good luck,
-jason
On Oct 12, 2005, at 2:02 PM, Harish S wrote:

> Hi gurus,
> I am a newbie to this grp using bioperl version 1.5.0.
> Like i am trying to retrieve a list of swiss prot seqs
> from swiss prot.The file sample1.eco has one
> swiss-prot id per line.
> The code:
> ----
> open(SEQID,'sample1.eco') || die 'Cannot open
> file',$!;
> @seqids=<SEQID>;
> for ($i=0;$i<@seqids;$i++)
> {
> use Bio::DB::SwissProt;
> $database= new Bio::DB::SwissProt;
> $seq = $database->get_Seq_by_id($seqids[$i]);
> print $seq->seq(), "\n\n";
> }
> ----
> works out, but this takes a long time...as it is
> retrieving one by one.
>
> so i tried to use get_Stream_by_batch but it gave me
> an error saying its deprecated and suggested me to use
> get_Stream_by_id.
>
> So tried this out..
> ----
> open(SEQID,'sample1.eco') || die 'Cannot open
> file',$!;
> @seqids=<SEQID>;
> $ref=\@seqids;
> for ($i=0;$i<@seqids;$i++)
> {
> use Bio::DB::SwissProt;
> $database= new Bio::DB::SwissProt;
> $seq = $database->get_Stream_by_id($ref);
> print $seq->seq(), "\n\n";
> }
> ----
> but this gave me an error saying
> Can't locate object method "seq" via package
> "Bio::SeqIO::swiss"
>
> How do i proceed...?
> Thanks in Advance.
>
>  HARISH.S
>
>
>
>
>
>
> __________________________________
> Yahoo! Mail - PC Magazine Editors' Choice 2005
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12