[Bioperl-l] Fetching Fasta seqs from GenBank - Help request

Thu Mar 25 15:38:32 EST 2004

Hi Sean,

Thanks for your valuable help !

I solved the problem using "Bio::DB::Query::GenBank", my goal was to
retrieve 2 types of sequences (mitochondrial and ribosomal) from
specific organism (eg Bothrops spp)... I am listing my script for those
interested to do something similar.. the only warning I get is:

[davila at tryps script]$ perl fetch2contaminant.pl
Useless use of a constant in void context at fetch2contaminant.pl line
10.

I was not sure in which field (eg keyword or feature) I should look for
ribosomal and mitochondrial genes, but leaving blank gave some good
results.

Indeed Bioperl is powerful... a bit confusing for beginners too.

Thanks and best regards,

Alberto

#!/usr/local/bin/perl -w

use lib "/usr/local/bioperl14";
use strict;
use Bio::DB::Query::GenBank;
use Bio::SeqIO;
use Bio::DB::GenBank;

my $query_string = ('Bothrops[Organism] AND
ribosomal','Bothrops[Organism] AND mitochondrial');
my $query = new Bio::DB::Query::GenBank(-db=>'nucleotide',
                                        -query=>$query_string,
		                        -mindate => '1985',
		                        -maxdate => '2004');

my $seqio=new Bio::DB::GenBank->get_Stream_by_query($query);

#open a seqio handle for writing the outputfile in fasta
my $outfile = new Bio::SeqIO(-format=>'fasta',
                             -file=>'>contaminant.bothrops');

 while (my $s = $seqio->next_seq) {

#write the fasta  
   $outfile->write_seq($s);

	}			  

	  exit;

On Thu, 2004-03-25 at 16:37, Sean Davis wrote:
> Alberto,
> 
> I would second that.  If are doing more with this than retrieving raw
> sequence (if you care at all), maybe you could let Barry and I know what you
> are trying to do more generally.  Bioperl is quite powerful, but it does
> take some direction to get started.
> 
> Sean
> 
> On 3/25/04 12:43 PM, "Barry Moore" <barry.moore at genetics.utah.edu> wrote:
> 
> > Alberto-
> > 
> > You said, "the 'get_Stream_by_id' is returning me more than the
> > 'sequence per se'".  I'm not sure if this is what your asking, but I'll
> > take a shot.  Since your are retrieving your two sequences in EMBL
> > format, you get all the associated information that you would see if you
> > 
> > downloaded that same file from the web interface.  Your sequences are
> > stored by BioPerl as RichSeq objects which inherits a PrimarySeq
> > objects.  So that EMBL file data is stored in the RichSeq object and the
> > 
> > associated PrimarySeq object it inherited.   Of course when you save
> > that locally as a fasta file, that extra information is lost.  If you
> > decide you need to use that data have a look at the documentation for
> > Bio::Seq::RichSeq and Bio::PrimarySeq and the SeqIO and Feature
> > Annotation HOW TOs to learn more.
> > 
> > Barry
> > 
> > Alberto Davila wrote:
> > 
> >> Thanks Jason,
> >> 
> >> I installed the IO::String, then it is working fine now. However I have
> >> a doubt, the "get_Stream_by_id" is returning me more than the "sequence
> >> per se", what is it ? My script and results are listed below. Finally I
> >> would like to save (in my local disk) the retrieved sequences as fasta
> >> files... is there any argument for that ?
> >> 
> >> Thanks again, Alberto
> >> 
> >> 
> >> #!/usr/local/bin/perl -w
> >> 
> >> use lib "/usr/local/bioperl14";
> >> use Bio::DB::BioFetch;
> >> use strict;
> >> use Bio::DB::WebDBSeqI;
> >> use HTTP::Request::Common 'POST';
> >> 
> >> my $format_type='fasta';
> >> my $stream;
> >> 
> >> 
> >> my $bf = new Bio::DB::BioFetch(-format        =>$format_type,
> >>                               -retrievaltype =>'tempfile',
> >>       -db            =>'EMBL');
> >>  
> >> $stream = $bf->get_Stream_by_id(['BUM','J00231']);
> >> while (my $s = $stream->next_seq) {
> >>    print $s->seq,"\n\n\n";
> >> }              
> >>  
> >>  
> >>  exit;
> >> 
> >> 
> >> 
> >> 
> >> [davila at tryps script]$ perl gb-fetch-1.pl
> >> agtagtgtactaccaagtatagataacgtttaaatattaaagttttggatcaaagccaaagatgattcgca
> > t
> >> gctggtgctgattgtagttacagctgcaagcccagtgtatcagagatgtttccaagatggggctatagtga
> > a
> >> gcaaaacccatccaaagaggcagtcacagaagtgtccctaaaagatgatgttagca
> >> 
> > 
> >> 
> > 
> >> cctggacctcctgtgcaagaacatgaaacanctgtggttcttccttctcctggtggcagctcccagatggg
> > t
> >> cctgtcccaggtgcacctgcaggagtcgggcccaggactggggaagcctccagagctcaaaaccccacttg
> > g
> >> tgacacaactcacacatgcccacggtgcccagagcccaaatcttgtgacacacctcccccgtgcccacggt
> > g
> >> cccagagcccaaatcttgtgacacacctcccccatgcccacggtgcccagagcccaaatcttgtgacacac
> > c
> >> tcccccgtgcccnnngtgcccagcacctgaactcttgggaggaccgtcagtcttcctcttccccccaaaac
> > c
> >> caaggatacccttatgatttcccggacccctgaggtcacgtgcgtggtggtggacgtgagccacgaagacc
> > c
> >> nnnngtccagttcaagtggtacgtggacggcgtggaggtgcataatgccaagacaaagctgcgggaggagc
> > a
> >> gtacaacagcacgttccgtgtggtcagcgtcctcaccgtcctgcaccaggactggctgaacggcaaggagt
> > a
> >> caagtgcaaggtctccaacaaagccctcccagcccccatcgagaaaaccatctccaaagccaaaggacagc
> > c
> >> cnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnngaggagatgaccaagaaccaagtcagcctgacct
> > g
> >> cctggtcaaaggcttctaccccagcgacatcgccgtggagtgggagagcaatgggcagccggagaacaact
> > a
> >> caacaccacgcctcccatgctggactccgacggctccttcttcctctacagcaagctcaccgtggacaaga
> > g
> >> caggtggcagcaggggaacatcttctcatgctccgtgatgcatgaggctctgcacaaccgctacacgcaga
> > a
> >> gagcctctccctgtctccgggtaaatgagtgccatggccggcaagcccccgctccccgggctctcggggtc
> > g
> >> cgcgaggatgcttggcacgtaccccgtgtacatacttcccaggcacccagcatggaaataaagcacccagc
> > g
> >> ctgccctgg
> >> 
> >> 
> >> 
> >> 
> >> On Tue, 2004-03-23 at 22:44, Jason Stajich wrote:
> >>  
> >> 
> >>> You need an additional perl module.
> >>> 
> >>> 
> >>> install IO::String from CPAN
> >>> 
> >>> There is a section on how to install additional perl modules in the
> >>> INSTALL document.
> >>> 
> >>> -j
> >>> 
> >>> On Tue, 23 Mar 2004, Alberto Davila wrote:
> >>> 
> >>>    
> >>> 
> >>>> Hi,
> >>>> 
> >>>> May I ask for some help ?
> >>>> 
> >>>> I am trying to use the BioFetch module in order to download several
> > seqs
> >>>> (from specific Organisms) from GenBank in fasta format, but looks
> > like I
> >>>> am missing "IO/String.pm" and other things.. should I install
> > additional
> >>>> bioperl modules (I have the Bioperl Core 1.4 installed) ? or use a
> >>>> different module for my purpose ?
> >>>> 
> >>>> My script and error msg are listed below.
> >>>> 
> >>>> Thanks and besr regards,
> >>>> 
> >>>> Alberto
> >>>> 
> >>>> ****
> >>>> 
> >>>> #!/usr/local/bin/perl -w
> >>>> 
> >>>> use lib "/usr/local/bioperl14";
> >>>> package Bio::DB::BioFetch;
> >>>> use strict;
> >>>> use Bio::DB::WebDBSeqI;
> >>>> use HTTP::Request::Common 'POST';
> >>>> 
> >>>> my $format_type='fasta';
> >>>> my $stream;
> >>>> 
> >>>> 
> >>>> my $bf = new Bio::DB::BioFetch(-format        =>$format_type',
> >>>>                               -retrievaltype =>'tempfile',
> >>>>                               -db            =>'EMBL');
> >>>> 
> >>>> $stream = $bf->get_Stream_by_id(['BUM','J00231']);
> >>>> while (my $s = $stream->next_seq) {
> >>>>    print $s->seq,"\n";
> >>>>        }
> >>>> 
> >>>> 
> >>>>          exit;
> >>>> 
> >>>> 
> >>>> [davila at tryps script]$ perl gb-fetch-1.pl
> >>>> Can't locate IO/String.pm in @INC (@INC contains:
> >>>> /usr/local/bioperl14/i386-linux-thread-multi /usr/local/bioperl14
> >>>> /usr/lib/perl5/5.8.3/i386-linux-thread-multi /usr/lib/perl5/5.8.3
> >>>> /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi
> >>>> /usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi
> >>>> /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi
> >>>> /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi
> >>>> /usr/lib/perl5/site_perl/5.8.3 /usr/lib/perl5/site_perl/5.8.2
> >>>> /usr/lib/perl5/site_perl/5.8.1 /usr/lib/perl5/site_perl/5.8.0
> >>>> /usr/lib/perl5/site_perl
> >>>> /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi
> >>>> /usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi
> >>>> /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi
> >>>> /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi
> >>>> /usr/lib/perl5/vendor_perl/5.8.3 /usr/lib/perl5/vendor_perl/5.8.2
> >>>> /usr/lib/perl5/vendor_perl/5.8.1 /usr/lib/perl5/vendor_perl/5.8.0
> >>>> /usr/lib/perl5/vendor_perl .) at
> >>>> /usr/local/bioperl14/Bio/DB/WebDBSeqI.pm line 90.
> >>>> BEGIN failed--compilation aborted at
> >>>> /usr/local/bioperl14/Bio/DB/WebDBSeqI.pm line 90.
> >>>> Compilation failed in require at gb-fetch-1.pl line 6.
> >>>> BEGIN failed--compilation aborted at gb-fetch-1.pl line 6.