[Bioperl-l] Problems when retrieve sequences from GenBank

Fields, Christopher J cjfields at illinois.edu
Tue Oct 23 14:47:04 UTC 2012


The 'Bad Gateway' error may very well be a server-side issue (see the HTML output, which gives a NCBI email).  The other issue with the BioProject should be addressed in the latest bipoperl code.  

However, I'm really not sure how you managed to get the script working correctly to begin with; are you sure this script is complete?  I gave up testing just b/c there are too many fundamental problems I was correcting (my limit is usually 2-3 corrections, I went past that in this case):

1) Missing variables, arrays, hashes ($seq, @names, %hash)
2) Possibly mislabeled variable, $cont1 -> $cont?
3) No localization

I want to point out (and I hate to be the grammar police) but adding 'use strict; use warnings;' would have caught pretty much all of these issues.

chris

On Oct 23, 2012, at 6:33 AM, Caio Freire <freire at ime.usp.br> wrote:

> Hi BioPerl users,
> 
> A couple of year ago, I wrote a script to retrieve sequences from GenBank,
> using Bio::DB::GenBank module and it worked very well in my machine on
> Ubuntu 10.04. Now, I'm having some problems to do this job on Ubuntu 12.04,
> since my script returns a warning like this "MSG: Unrecognized DBSOURCE
> data: BioProject: PRJNA37833". I don't find an obvious solution in Google.
> Could anyone help me? My script is in the bottom of this mail.
> 
> 
> 
> Sometimes the message is:
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: WebDBSeqI Request Error:
> HTTP/1.1 502 Bad Gateway
> Connection: close
> Date: Tue, 23 Oct 2012 11:08:47 GMT
> Accept-Ranges: bytes
> Server: Apache
> Vary: accept-language,accept-charset
> Content-Language: en
> Content-Type: text/html; charset=iso-8859-1
> Client-Date: Tue, 23 Oct 2012 11:09:25 GMT
> Client-Peer: 165.112.7.20:80
> Client-Response-Num: 1
> Client-Transfer-Encoding: chunked
> Link: <mailto:info at ncbi.nlm.nih.gov>; rev="made"
> Title: Bad Gateway!
> 
> <?xml version="1.0" encoding="ISO-8859-1"?>
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
>  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
> <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
> <head>
> <title>Bad Gateway!</title>
> <link rev="made" href="mailto:info at ncbi.nlm.nih.gov" />
> <style type="text/css"><!--/*--><![CDATA[/*><!--*/
>    body { color: #000000; background-color: #FFFFFF; }
>    a:link { color: #0000CC; }
>    p, address {margin-left: 3em;}
>    span {font-size: smaller;}
> /*]]>*/--></style>
> </head>
> 
> <body>
> <h1>Bad Gateway!</h1>
> <p>
> 
> 
>    The proxy server received an invalid
>    response from an upstream server.
> 
> 
>    </p>
> <p>
> 
>    The proxy server could not handle the request <em><a
> href="/entrez/eutils/efetch.fcgi">GET&nbsp;/entrez/eutils/efetch.fcgi</a></em>.<p>
> Reason: <strong>Error reading from remote server</strong></p>
> 
> 
> </p>
> <p>
> If you think this is a server error, please contact
> the <a href="mailto:info at ncbi.nlm.nih.gov">webmaster</a>.
> 
> </p>
> 
> <h2>Error 502</h2>
> <address>
>  <a href="/">eutils.ncbi.nlm.nih.gov</a><br />
> 
>  <span>Tue Oct 23 07:08:47 2012<br />
>  Apache</span>
> </address>
> </body>
> </html>
> 
> 
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:472
> STACK: Bio::DB::WebDBSeqI::_stream_request
> /usr/share/perl5/Bio/DB/WebDBSeqI.pm:773
> STACK: Bio::DB::WebDBSeqI::get_seq_stream
> /usr/share/perl5/Bio/DB/WebDBSeqI.pm:467
> STACK: Bio::DB::WebDBSeqI::get_Stream_by_id
> /usr/share/perl5/Bio/DB/WebDBSeqI.pm:288
> STACK: Bio::DB::WebDBSeqI::get_Seq_by_id
> /usr/share/perl5/Bio/DB/WebDBSeqI.pm:158
> STACK: check.pl:28
> -----------------------------------------------------------
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: id does not exist
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/share/perl5/Bio/Root/Root.pm:472
> STACK: Bio::DB::WebDBSeqI::get_Seq_by_id
> /usr/share/perl5/Bio/DB/WebDBSeqI.pm:167
> STACK: check.pl:28
> -----------------------------------------------------------
> 
> 
> 
> 
> My script:
> 
> #!/usr/bin/perl -w
> 
> use Bio::DB::GenBank;
> 
> $file = shift @ARGV;
> open FH, "<$file";
> @content = <FH>;
> $cont1=0;
> foreach $line (@content){
> if ($line =~ /^>.+/g){
> $name = $line;
> $name =~ s/>//g;
> push (@names, $name);
> $seq = '';
> $cont++;
> }#close if
> elsif($line !~ /^>/){
> $seq .= $line;
> $seq =~ s/\s//g;
> $seq = uc($seq);
> }#close elsif
> $hash1{$name} = $seq;
> }
> 
> ######################################################
> $db_obj = Bio::DB::GenBank->new;
> foreach $name(@names){
> @split = split (/_/, $name);
> $ident = $split[-1];
> $Bio::Seq::seq_obj = $db_obj->get_Seq_by_id($ident);
> $GBsequencia = $Bio::Seq::seq_obj->seq();
> $GBsequencia =~ s/\n+//g;
> $seq = $hash1{$name};
> if ($GBsequencia ne $seq){
> $cont1++;
> print "Problems: $name\n"
> }
> }
> print "Total: $cont\tProblems: $cont1\n";
> 
> 
> 
> 
> 
> Best,
> ===============================================================
> Caio César de Melo Freire, BSc Biomedicine
> 
> PhD candidate - Bioinformatics
> 
> Laboratory of Molecular Evolution and Bioinformatics
> Institute of Biomedical Sciences - II
> 
> University of Sao Paulo
> +551130918453
> Av. Prof. Lineu Prestes, 1374 - Cidade Universitária "Armando Salles
> Oliveira", Butantã - São Paulo - SP - CEP 05508-900
> ================================================================
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list