[Bioperl-l] Batch retrieval of seq from swiss-prot
Jason Stajich
jason.stajich at duke.edu
Wed Oct 12 14:27:11 EDT 2005
Well I'm sure this isn't the major cause of your slowness, but you
are re-initializing the db handle in your loop each time,
Code like this
#!/usr/bin/perl -w
use strict;
use Bio::DB::SwissProt;
my $database= new Bio::DB::SwissProt;
open(SEQIDS, 'sample1.eco') || die "$!";
while(<SEQIDS>) {
my $seqid = $_;
chomp($seqid);
my $seq = $database->get_Seq_by_acc($seqid);
print $seq->seq(), "\n\n";
}
You can also switch the swissprot provider to a local mirror.
$database->hostlocation('australia')
if you are in australia for example.
You have to read the code to see what are the available mirrors for now:
here is what I've defined in the module, there may be more mirrors
now, I'm not sure:
hosts' =>
{
'switzerland' => 'ch.expasy.org',
'canada' => 'ca.expasy.org',
'china' => 'cn.expasy.org',
'taiwan' => 'tw.expasy.org',
'australia' => 'au.expasy.org',
'korea' => 'kr.expasy.org',
'us' => 'us.expasy.org',
},
You can see the module's code here:
perldoc -m Bio::DB::Swissprot
The real problem is that the swissprot web interface doesn't support
a batch mode - this is the data provider limitation and bioperl has
control over this.
Your options are (if you want fully annotated proteins and not just a
fasta file):
a) download swissprot and use Bio::Index::Swissprot to index it
locally and get superfast access
b) use Bio::DB::GenPept and get back NCBI-ized Swissprot records
(batch mode does work for NBCI)
if you just want fasta either use Bio::DB::GenPept or download the
swissprot db from NCBI and index it with Bio::Index::Fasta or
Bio::DB::Fasta.
Good luck,
-jason
On Oct 12, 2005, at 2:02 PM, Harish S wrote:
> Hi gurus,
> I am a newbie to this grp using bioperl version 1.5.0.
> Like i am trying to retrieve a list of swiss prot seqs
> from swiss prot.The file sample1.eco has one
> swiss-prot id per line.
> The code:
> ----
> open(SEQID,'sample1.eco') || die 'Cannot open
> file',$!;
> @seqids=<SEQID>;
> for ($i=0;$i<@seqids;$i++)
> {
> use Bio::DB::SwissProt;
> $database= new Bio::DB::SwissProt;
> $seq = $database->get_Seq_by_id($seqids[$i]);
> print $seq->seq(), "\n\n";
> }
> ----
> works out, but this takes a long time...as it is
> retrieving one by one.
>
> so i tried to use get_Stream_by_batch but it gave me
> an error saying its deprecated and suggested me to use
> get_Stream_by_id.
>
> So tried this out..
> ----
> open(SEQID,'sample1.eco') || die 'Cannot open
> file',$!;
> @seqids=<SEQID>;
> $ref=\@seqids;
> for ($i=0;$i<@seqids;$i++)
> {
> use Bio::DB::SwissProt;
> $database= new Bio::DB::SwissProt;
> $seq = $database->get_Stream_by_id($ref);
> print $seq->seq(), "\n\n";
> }
> ----
> but this gave me an error saying
> Can't locate object method "seq" via package
> "Bio::SeqIO::swiss"
>
> How do i proceed...?
> Thanks in Advance.
>
> HARISH.S
>
>
>
>
>
>
> __________________________________
> Yahoo! Mail - PC Magazine Editors' Choice 2005
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
--
Jason Stajich
Duke University
http://www.duke.edu/~jes12
More information about the Bioperl-l
mailing list