[Bioperl-l] Fetching Fasta seqs from GenBank - Help request
Alberto Davila
davila at ioc.fiocruz.br
Sat Mar 27 10:08:18 EST 2004
Hi Remo,
Thanks for catching this, so Sean was right as well... I was confused
because the first "$query_string" was returning me sequences with
"ribosomal" words... I just realized they were retrieved because they
have the "mitochondrial" word together:
>AY223674 Bothrops jararacussu specimen-voucher DPL 104 16S ribosomal
RNA gene,
partial sequence; mitochondrial gene for mitochondrial product.
>AY223673 Bothrops alternatus specimen-voucher DPL 2879 16S ribosomal
RNA gene,
partial sequence; mitochondrial gene for mitochondrial product.
I am now "querying" "ribosomal" and "mitochondrial" genes in the
"[title]" field... then, things are working ok now. Thanks !
Alberto
On Sat, 2004-03-27 at 10:12, sanges at biogem.it wrote:
> :
>
> Alberto,
>
> you have an error in your code:
>
> my $query_string = ('Bothrops[Organism] AND
> ribosomal','Bothrops[Organism] AND mitochondrial');
>
> with this line you are putting an array into string,
> try to add this line
>
> print $query_string
>
> and see: you have only the last value in your query_string!
>
> If I understood well your need you should use a quesry like this:
>
> my $query_string = 'Bothrops[Organism] AND (ribosomal OR mitochondrial)';
>
> Remo
>
> Quoting Alberto Davila <davila at ioc.fiocruz.br>:
>
> > Hi Sean,
> >
> > Thanks for your valuable help !
> >
> > I solved the problem using "Bio::DB::Query::GenBank", my goal was to
> > retrieve 2 types of sequences (mitochondrial and ribosomal) from
> > specific organism (eg Bothrops spp)... I am listing my script for those
> > interested to do something similar.. the only warning I get is:
> >
> > [davila at tryps script]$ perl fetch2contaminant.pl
> > Useless use of a constant in void context at fetch2contaminant.pl line
> > 10.
> >
> > I was not sure in which field (eg keyword or feature) I should look for
> > ribosomal and mitochondrial genes, but leaving blank gave some good
> > results.
> >
> > Indeed Bioperl is powerful... a bit confusing for beginners too.
> >
> > Thanks and best regards,
> >
> > Alberto
> >
> >
> > #!/usr/local/bin/perl -w
> >
> > use lib "/usr/local/bioperl14";
> > use strict;
> > use Bio::DB::Query::GenBank;
> > use Bio::SeqIO;
> > use Bio::DB::GenBank;
> >
> >
> > my $query_string = ('Bothrops[Organism] AND
> > ribosomal','Bothrops[Organism] AND mitochondrial');
> > my $query = new Bio::DB::Query::GenBank(-db=>'nucleotide',
> > -query=>$query_string,
> > -mindate => '1985',
> > -maxdate => '2004');
> >
> > my $seqio=new Bio::DB::GenBank->get_Stream_by_query($query);
> >
> > #open a seqio handle for writing the outputfile in fasta
> > my $outfile = new Bio::SeqIO(-format=>'fasta',
> > -file=>'>contaminant.bothrops');
> >
> > while (my $s = $seqio->next_seq) {
> >
> > #write the fasta
> > $outfile->write_seq($s);
> >
> > }
> >
> >
> > exit;
> >
> >
> >
> >
> >
> >
> >
> > On Thu, 2004-03-25 at 16:37, Sean Davis wrote:
> > > Alberto,
> > >
> > > I would second that. If are doing more with this than retrieving raw
> > > sequence (if you care at all), maybe you could let Barry and I know what
> > you
> > > are trying to do more generally. Bioperl is quite powerful, but it does
> > > take some direction to get started.
> > >
> > > Sean
> > >
> > > On 3/25/04 12:43 PM, "Barry Moore" <barry.moore at genetics.utah.edu>
> wrote:
> > >
> > > > Alberto-
> > > >
> > > > You said, "the 'get_Stream_by_id' is returning me more than the
> > > > 'sequence per se'". I'm not sure if this is what your asking, but
> I'll
> > > > take a shot. Since your are retrieving your two sequences in EMBL
> > > > format, you get all the associated information that you would see if
> you
> > > >
> > > > downloaded that same file from the web interface. Your sequences are
> > > > stored by BioPerl as RichSeq objects which inherits a PrimarySeq
> > > > objects. So that EMBL file data is stored in the RichSeq object and
> the
> > > >
> > > > associated PrimarySeq object it inherited. Of course when you save
> > > > that locally as a fasta file, that extra information is lost. If you
> > > > decide you need to use that data have a look at the documentation for
> > > > Bio::Seq::RichSeq and Bio::PrimarySeq and the SeqIO and Feature
> > > > Annotation HOW TOs to learn more.
> > > >
> > > > Barry
> > > >
> > > > Alberto Davila wrote:
> > > >
> > > >> Thanks Jason,
> > > >>
> > > >> I installed the IO::String, then it is working fine now. However I
> have
> > > >> a doubt, the "get_Stream_by_id" is returning me more than the
> "sequence
> > > >> per se", what is it ? My script and results are listed below. Finally
> I
> > > >> would like to save (in my local disk) the retrieved sequences as
> fasta
> > > >> files... is there any argument for that ?
> > > >>
> > > >> Thanks again, Alberto
> > > >>
> > > >>
> > > >> #!/usr/local/bin/perl -w
> > > >>
> > > >> use lib "/usr/local/bioperl14";
> > > >> use Bio::DB::BioFetch;
> > > >> use strict;
> > > >> use Bio::DB::WebDBSeqI;
> > > >> use HTTP::Request::Common 'POST';
> > > >>
> > > >> my $format_type='fasta';
> > > >> my $stream;
> > > >>
> > > >>
> > > >> my $bf = new Bio::DB::BioFetch(-format =>$format_type,
> > > >> -retrievaltype =>'tempfile',
> > > >> -db =>'EMBL');
> > > >>
> > > >> $stream = $bf->get_Stream_by_id(['BUM','J00231']);
> > > >> while (my $s = $stream->next_seq) {
> > > >> print $s->seq,"\n\n\n";
> > > >> }
> > > >>
> > > >>
> > > >> exit;
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> [davila at tryps script]$ perl gb-fetch-1.pl
> > > >>
> agtagtgtactaccaagtatagataacgtttaaatattaaagttttggatcaaagccaaagatgattcgca
> > > > t
> > > >>
> gctggtgctgattgtagttacagctgcaagcccagtgtatcagagatgtttccaagatggggctatagtga
> > > > a
> > > >> gcaaaacccatccaaagaggcagtcacagaagtgtccctaaaagatgatgttagca
> > > >>
> > > >
> > > >>
> > > >
> > > >>
> cctggacctcctgtgcaagaacatgaaacanctgtggttcttccttctcctggtggcagctcccagatggg
> > > > t
> > > >>
> cctgtcccaggtgcacctgcaggagtcgggcccaggactggggaagcctccagagctcaaaaccccacttg
> > > > g
> > > >>
> tgacacaactcacacatgcccacggtgcccagagcccaaatcttgtgacacacctcccccgtgcccacggt
> > > > g
> > > >>
> cccagagcccaaatcttgtgacacacctcccccatgcccacggtgcccagagcccaaatcttgtgacacac
> > > > c
> > > >>
> tcccccgtgcccnnngtgcccagcacctgaactcttgggaggaccgtcagtcttcctcttccccccaaaac
> > > > c
> > > >>
> caaggatacccttatgatttcccggacccctgaggtcacgtgcgtggtggtggacgtgagccacgaagacc
> > > > c
> > > >>
> nnnngtccagttcaagtggtacgtggacggcgtggaggtgcataatgccaagacaaagctgcgggaggagc
> > > > a
> > > >>
> gtacaacagcacgttccgtgtggtcagcgtcctcaccgtcctgcaccaggactggctgaacggcaaggagt
> > > > a
> > > >>
> caagtgcaaggtctccaacaaagccctcccagcccccatcgagaaaaccatctccaaagccaaaggacagc
> > > > c
> > > >>
> cnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnngaggagatgaccaagaaccaagtcagcctgacct
> > > > g
> > > >>
> cctggtcaaaggcttctaccccagcgacatcgccgtggagtgggagagcaatgggcagccggagaacaact
> > > > a
> > > >>
> caacaccacgcctcccatgctggactccgacggctccttcttcctctacagcaagctcaccgtggacaaga
> > > > g
> > > >>
> caggtggcagcaggggaacatcttctcatgctccgtgatgcatgaggctctgcacaaccgctacacgcaga
> > > > a
> > > >>
> gagcctctccctgtctccgggtaaatgagtgccatggccggcaagcccccgctccccgggctctcggggtc
> > > > g
> > > >>
> cgcgaggatgcttggcacgtaccccgtgtacatacttcccaggcacccagcatggaaataaagcacccagc
> > > > g
> > > >> ctgccctgg
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Tue, 2004-03-23 at 22:44, Jason Stajich wrote:
> > > >>
> > > >>
> > > >>> You need an additional perl module.
> > > >>>
> > > >>>
> > > >>> install IO::String from CPAN
> > > >>>
> > > >>> There is a section on how to install additional perl modules in the
> > > >>> INSTALL document.
> > > >>>
> > > >>> -j
> > > >>>
> > > >>> On Tue, 23 Mar 2004, Alberto Davila wrote:
> > > >>>
> > > >>>
> > > >>>
> > > >>>> Hi,
> > > >>>>
> > > >>>> May I ask for some help ?
> > > >>>>
> > > >>>> I am trying to use the BioFetch module in order to download several
> > > > seqs
> > > >>>> (from specific Organisms) from GenBank in fasta format, but looks
> > > > like I
> > > >>>> am missing "IO/String.pm" and other things.. should I install
> > > > additional
> > > >>>> bioperl modules (I have the Bioperl Core 1.4 installed) ? or use a
> > > >>>> different module for my purpose ?
> > > >>>>
> > > >>>> My script and error msg are listed below.
> > > >>>>
> > > >>>> Thanks and besr regards,
> > > >>>>
> > > >>>> Alberto
> > > >>>>
> > > >>>> ****
> > > >>>>
> > > >>>> #!/usr/local/bin/perl -w
> > > >>>>
> > > >>>> use lib "/usr/local/bioperl14";
> > > >>>> package Bio::DB::BioFetch;
> > > >>>> use strict;
> > > >>>> use Bio::DB::WebDBSeqI;
> > > >>>> use HTTP::Request::Common 'POST';
> > > >>>>
> > > >>>> my $format_type='fasta';
> > > >>>> my $stream;
> > > >>>>
> > > >>>>
> > > >>>> my $bf = new Bio::DB::BioFetch(-format =>$format_type',
> > > >>>> -retrievaltype =>'tempfile',
> > > >>>> -db =>'EMBL');
> > > >>>>
> > > >>>> $stream = $bf->get_Stream_by_id(['BUM','J00231']);
> > > >>>> while (my $s = $stream->next_seq) {
> > > >>>> print $s->seq,"\n";
> > > >>>> }
> > > >>>>
> > > >>>>
> > > >>>> exit;
> > > >>>>
> > > >>>>
> > > >>>> [davila at tryps script]$ perl gb-fetch-1.pl
> > > >>>> Can't locate IO/String.pm in @INC (@INC contains:
> > > >>>> /usr/local/bioperl14/i386-linux-thread-multi /usr/local/bioperl14
> > > >>>> /usr/lib/perl5/5.8.3/i386-linux-thread-multi /usr/lib/perl5/5.8.3
> > > >>>> /usr/lib/perl5/site_perl/5.8.3/i386-linux-thread-multi
> > > >>>> /usr/lib/perl5/site_perl/5.8.2/i386-linux-thread-multi
> > > >>>> /usr/lib/perl5/site_perl/5.8.1/i386-linux-thread-multi
> > > >>>> /usr/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi
> > > >>>> /usr/lib/perl5/site_perl/5.8.3 /usr/lib/perl5/site_perl/5.8.2
> > > >>>> /usr/lib/perl5/site_perl/5.8.1 /usr/lib/perl5/site_perl/5.8.0
> > > >>>> /usr/lib/perl5/site_perl
> > > >>>> /usr/lib/perl5/vendor_perl/5.8.3/i386-linux-thread-multi
> > > >>>> /usr/lib/perl5/vendor_perl/5.8.2/i386-linux-thread-multi
> > > >>>> /usr/lib/perl5/vendor_perl/5.8.1/i386-linux-thread-multi
> > > >>>> /usr/lib/perl5/vendor_perl/5.8.0/i386-linux-thread-multi
> > > >>>> /usr/lib/perl5/vendor_perl/5.8.3 /usr/lib/perl5/vendor_perl/5.8.2
> > > >>>> /usr/lib/perl5/vendor_perl/5.8.1 /usr/lib/perl5/vendor_perl/5.8.0
> > > >>>> /usr/lib/perl5/vendor_perl .) at
> > > >>>> /usr/local/bioperl14/Bio/DB/WebDBSeqI.pm line 90.
> > > >>>> BEGIN failed--compilation aborted at
> > > >>>> /usr/local/bioperl14/Bio/DB/WebDBSeqI.pm line 90.
> > > >>>> Compilation failed in require at gb-fetch-1.pl line 6.
> > > >>>> BEGIN failed--compilation aborted at gb-fetch-1.pl line 6.
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at portal.open-bio.org
> > http://portal.open-bio.org/mailman/listinfo/bioperl-l
> >
More information about the Bioperl-l
mailing list