[Bioperl-l] RE: [Pubmednew] Problem with esearch.fcgi

Andrew Walsh walsh at cenix-bioscience.com
Tue Oct 7 07:33:59 EDT 2003


I seem to be able to retrieve X[MP] sequences using Bioperl version 1.2.2.

Depending on the type of accession I'm dealing with, I use either 
Bio::DB::GenBank or Bio::DB::RefSeq.

The following code retrieves both XM_055766 and XP_055766 (the 
corresponding peptide) from the web:

As a side note (maybe someone with more knowledge of the SeqIO parser 
knows why). The 'KEYWORD' part of the Genbank file does not seem to be 
retrieved properly.
I get this line in the sequence file: 'KEYWORDS    ARRAY(0x8ba8f30).'

# the code...

# $fmt has been set to 'genbank' or 'fasta'
# get $accs, ref to array of accessions

for my $acc (@$accs) {
     my %ext_map = (genbank => 'gbff',
                    fasta   => 'fasta');
     my $file = "$acc.$ext_map{$fmt}";
     my $io   = Bio::SeqIO->new(-file => ">$file", -format => $fmt);
     my $seq;
     eval {
         $seq = get_seq_obj_from_web($acc);
     };
     if ($@) {
         warn "Could not get $acc: $@\n";
     }
     else {
         $io->write_seq($seq);
     }
}

sub get_seq_obj_from_web {
     my ($acc_num) = @_;
     warn "Getting $acc_num from web\n";
     my $ncbi_db = get_web_database_obj($acc_num);
     my $seq;
     eval {
	$seq = $ncbi_db->get_Seq_by_id($acc_num);
     };
     if ($@ || !$seq) {
	cluck "Could not retrieve $acc_num from NCBI: $@";
	return undef;
     }
     else {
         return $seq;
     }
}

sub get_web_database_obj {
     my $acc = shift;
     my $ncbi_db;
     # appears that XM #'s only retrievable from GenBank, not RefSeq.
     if ($acc =~ /^[N]M_/) {
	$ncbi_db = new Bio::DB::RefSeq()

     elsif ($acc =~ /^[NX]P_/) {
	$ncbi_db = new Bio::DB::GenPept()
     }
     else {
	$ncbi_db = new Bio::DB::GenBank()
     }
     my $proxy = 'http://my.proxy.com:3128/';
     $ncbi_db->proxy(['http','ftp'], $proxy);
     return $ncbi_db;
}

Heikki Lehvaslaiho wrote:
> Richard,
> 
> If you look carefully at the entry XM_055766 in the Entrez server, you
> notice this:
> 
> COMMENT     MODEL REFSEQ:  This record is predicted by automated  
>             computational analysis. This record is derived from an 
>             annotated genomic sequence (NT_024812) using gene 
>             prediction method: BLAST, supported by mRNA and EST 
>             evidence.
> 
> It is not a GenBank entry although it looks like one and NCBI Entrez
> claims it is one but in fact it is from the RefSeq database. 
> 
> The XP_* entries can not be retrieved using Bio::DB::RefSeq, either,
> because it fetches entries from EBI copy of the datebase which doeas not
> contain all the latest sequence subclasses (only NC_*, NT_*, NM_*,
> NP_*). I'll try to get that updated.
> 
> 	-Heikki
> 
> 
> 
> On Tue, 2003-10-07 at 14:58, Holland, Richard wrote:
> 
>>Hi,
>>
>>Thanks for your answer regarding non-accessible accessions via
>>esearch.fcgi, but it doesn't quite solve my problem.
>>
>>I am accessing GenBank programmatically via a set of modules called
>>BioPerl (http://www.bioperl.org/), specifically Bio::DB::GenBank. These
>>access the database on my behalf via the esearch.fcgi script. They
>>depend on the script returning the plain GenBank file without any HTML
>>markup.
>>
>>The URL you sent me in response to my original question marks up the
>>response in HTML, and doesn't return just the plain GenBank file on its
>>own. 
>>
>>My question was why are the two scripts unable to agree on the existence
>>of a particular accession (XM_055766)? Surely they are accessing the
>>same database under the hood? Are there any plans to make esearch.fcgi
>>recognise these more recent accessions?
>>
>>I am copying this email to the BioPerl mailing list in case somebody
>>there can help out too.
>>
>>BioPerl people - what are the alternatives? I am having the same
>>problems with the EBI servers, dbfetch, and Bio::DB::EMBL, which also
>>does not believe that the accession XM_055766 exists (although I can see
>>it quite clearly using Entrez at the NCBI).
>>
>>cheers,
>>Richard
>>
>>---
>>Richard Holland
>>Bioinformatics Database Developer
>>ITS, Agresearch Invermay x3279
>>
>>
>>
>>-----Original Message-----
>>From: Monica Romiti [mailto:romiti at ncbi.nlm.nih.gov] 
>>Sent: Tuesday, 7 October 2003 1:42 p.m.
>>To: Holland, Richard
>>Cc: romiti at ncbi.nlm.nih.gov
>>Subject: FW: [Pubmednew] Problem with esearch.fcgi
>>
>>
>>Dear Colleague,
>>
>>Use:
>>
>>
>>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=search&db=nucleotide&t
>>erm=XM_0
>>55766[accn]&doptcmdl=GenBank
>>
>>
>>Best regards,
>>
>>Monica at NCBI
>>-----Original Message-----
>>
>>From:  custhelp at mail.nlm.nih.gov
>>Sent:  10/4/2003 04:24:45 PM
>>To:  Richard.Holland at agresearch.co.nz
>>Subject:  [Pubmednew] Problem with esearch.fcgi
>>
>>Richard Holland:
>>
>>Your email has been forwarded to the National Center for Biotechnology 
>>Information (info at ncbi.nlm.nih.gov)
>>
>>Ellen M. Layman
>>National Library of Medicine
>>
>>
>>-----Original Message-----
>>
>>From:  Richard.Holland at agresearch.co.nz
>>Sent:  10/1/2003 05:03:16 PM
>>To:  <pubmednew at ncbi.nlm.nih.gov>
>>Subject:  [Pubmednew] Problem with esearch.fcgi
>>
>>Hi,
>>
>>I can successfully search Entrez using the web-based forms for the
>>following term:
>>
>>XM_055766
>>
>>However, the same search via the eutils tool esearch.fcgi
>>(?db=nucleotide&term=XM_055766) returns no results. What's going on?
>>
>>cheers,
>>Richard
>>
>>
>>
>>=======================================================================
>>Attention: The information contained in this message and/or attachments
>>from AgResearch Limited is intended only for the persons or entities to
>>which it is addressed and may contain confidential and/or privileged
>>material. Any review, retransmission, dissemination or other use of, or
>>taking of any action in reliance upon, this information by persons or
>>entities other than the intended recipients is prohibited by AgResearch
>>Limited. If you have received this message in error, please notify the
>>sender immediately.
>>=======================================================================
>>
>>------------- End Forwarded Message -------------
>>
>>
>>------------- End Forwarded Message -------------
>>
>>
>>Best regards,
>>
>>Monica L. Romiti
>>NCBI User Services
>>
>>=======================================================================
>>Attention: The information contained in this message and/or attachments
>>from AgResearch Limited is intended only for the persons or entities
>>to which it is addressed and may contain confidential and/or privileged
>>material. Any review, retransmission, dissemination or other use of, or
>>taking of any action in reliance upon, this information by persons or
>>entities other than the intended recipients is prohibited by AgResearch
>>Limited. If you have received this message in error, please notify the
>>sender immediately.
>>=======================================================================
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at portal.open-bio.org
>>http://portal.open-bio.org/mailman/listinfo/bioperl-l


-- 
------------------------------------------------------------------
Andrew Walsh, M.Sc.
Bioinformatics Software Engineer
IT Unit
Cenix BioScience GmbH
Pfotenhauerstr. 108
01307 Dresden, Germany
Tel. +49(351)210-2699
Fax  +49(351)210-1309

public key: http://www.cenix-bioscience.com/public_keys/walsh.gpg
------------------------------------------------------------------




More information about the Bioperl-l mailing list