[Bioperl-l] RE: [Pubmednew] Problem with esearch.fcgi
Andrew Walsh
walsh at cenix-bioscience.com
Tue Oct 7 07:33:59 EDT 2003
I seem to be able to retrieve X[MP] sequences using Bioperl version 1.2.2.
Depending on the type of accession I'm dealing with, I use either
Bio::DB::GenBank or Bio::DB::RefSeq.
The following code retrieves both XM_055766 and XP_055766 (the
corresponding peptide) from the web:
As a side note (maybe someone with more knowledge of the SeqIO parser
knows why). The 'KEYWORD' part of the Genbank file does not seem to be
retrieved properly.
I get this line in the sequence file: 'KEYWORDS ARRAY(0x8ba8f30).'
# the code...
# $fmt has been set to 'genbank' or 'fasta'
# get $accs, ref to array of accessions
for my $acc (@$accs) {
my %ext_map = (genbank => 'gbff',
fasta => 'fasta');
my $file = "$acc.$ext_map{$fmt}";
my $io = Bio::SeqIO->new(-file => ">$file", -format => $fmt);
my $seq;
eval {
$seq = get_seq_obj_from_web($acc);
};
if ($@) {
warn "Could not get $acc: $@\n";
}
else {
$io->write_seq($seq);
}
}
sub get_seq_obj_from_web {
my ($acc_num) = @_;
warn "Getting $acc_num from web\n";
my $ncbi_db = get_web_database_obj($acc_num);
my $seq;
eval {
$seq = $ncbi_db->get_Seq_by_id($acc_num);
};
if ($@ || !$seq) {
cluck "Could not retrieve $acc_num from NCBI: $@";
return undef;
}
else {
return $seq;
}
}
sub get_web_database_obj {
my $acc = shift;
my $ncbi_db;
# appears that XM #'s only retrievable from GenBank, not RefSeq.
if ($acc =~ /^[N]M_/) {
$ncbi_db = new Bio::DB::RefSeq()
elsif ($acc =~ /^[NX]P_/) {
$ncbi_db = new Bio::DB::GenPept()
}
else {
$ncbi_db = new Bio::DB::GenBank()
}
my $proxy = 'http://my.proxy.com:3128/';
$ncbi_db->proxy(['http','ftp'], $proxy);
return $ncbi_db;
}
Heikki Lehvaslaiho wrote:
> Richard,
>
> If you look carefully at the entry XM_055766 in the Entrez server, you
> notice this:
>
> COMMENT MODEL REFSEQ: This record is predicted by automated
> computational analysis. This record is derived from an
> annotated genomic sequence (NT_024812) using gene
> prediction method: BLAST, supported by mRNA and EST
> evidence.
>
> It is not a GenBank entry although it looks like one and NCBI Entrez
> claims it is one but in fact it is from the RefSeq database.
>
> The XP_* entries can not be retrieved using Bio::DB::RefSeq, either,
> because it fetches entries from EBI copy of the datebase which doeas not
> contain all the latest sequence subclasses (only NC_*, NT_*, NM_*,
> NP_*). I'll try to get that updated.
>
> -Heikki
>
>
>
> On Tue, 2003-10-07 at 14:58, Holland, Richard wrote:
>
>>Hi,
>>
>>Thanks for your answer regarding non-accessible accessions via
>>esearch.fcgi, but it doesn't quite solve my problem.
>>
>>I am accessing GenBank programmatically via a set of modules called
>>BioPerl (http://www.bioperl.org/), specifically Bio::DB::GenBank. These
>>access the database on my behalf via the esearch.fcgi script. They
>>depend on the script returning the plain GenBank file without any HTML
>>markup.
>>
>>The URL you sent me in response to my original question marks up the
>>response in HTML, and doesn't return just the plain GenBank file on its
>>own.
>>
>>My question was why are the two scripts unable to agree on the existence
>>of a particular accession (XM_055766)? Surely they are accessing the
>>same database under the hood? Are there any plans to make esearch.fcgi
>>recognise these more recent accessions?
>>
>>I am copying this email to the BioPerl mailing list in case somebody
>>there can help out too.
>>
>>BioPerl people - what are the alternatives? I am having the same
>>problems with the EBI servers, dbfetch, and Bio::DB::EMBL, which also
>>does not believe that the accession XM_055766 exists (although I can see
>>it quite clearly using Entrez at the NCBI).
>>
>>cheers,
>>Richard
>>
>>---
>>Richard Holland
>>Bioinformatics Database Developer
>>ITS, Agresearch Invermay x3279
>>
>>
>>
>>-----Original Message-----
>>From: Monica Romiti [mailto:romiti at ncbi.nlm.nih.gov]
>>Sent: Tuesday, 7 October 2003 1:42 p.m.
>>To: Holland, Richard
>>Cc: romiti at ncbi.nlm.nih.gov
>>Subject: FW: [Pubmednew] Problem with esearch.fcgi
>>
>>
>>Dear Colleague,
>>
>>Use:
>>
>>
>>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=search&db=nucleotide&t
>>erm=XM_0
>>55766[accn]&doptcmdl=GenBank
>>
>>
>>Best regards,
>>
>>Monica at NCBI
>>-----Original Message-----
>>
>>From: custhelp at mail.nlm.nih.gov
>>Sent: 10/4/2003 04:24:45 PM
>>To: Richard.Holland at agresearch.co.nz
>>Subject: [Pubmednew] Problem with esearch.fcgi
>>
>>Richard Holland:
>>
>>Your email has been forwarded to the National Center for Biotechnology
>>Information (info at ncbi.nlm.nih.gov)
>>
>>Ellen M. Layman
>>National Library of Medicine
>>
>>
>>-----Original Message-----
>>
>>From: Richard.Holland at agresearch.co.nz
>>Sent: 10/1/2003 05:03:16 PM
>>To: <pubmednew at ncbi.nlm.nih.gov>
>>Subject: [Pubmednew] Problem with esearch.fcgi
>>
>>Hi,
>>
>>I can successfully search Entrez using the web-based forms for the
>>following term:
>>
>>XM_055766
>>
>>However, the same search via the eutils tool esearch.fcgi
>>(?db=nucleotide&term=XM_055766) returns no results. What's going on?
>>
>>cheers,
>>Richard
>>
>>
>>
>>=======================================================================
>>Attention: The information contained in this message and/or attachments
>>from AgResearch Limited is intended only for the persons or entities to
>>which it is addressed and may contain confidential and/or privileged
>>material. Any review, retransmission, dissemination or other use of, or
>>taking of any action in reliance upon, this information by persons or
>>entities other than the intended recipients is prohibited by AgResearch
>>Limited. If you have received this message in error, please notify the
>>sender immediately.
>>=======================================================================
>>
>>------------- End Forwarded Message -------------
>>
>>
>>------------- End Forwarded Message -------------
>>
>>
>>Best regards,
>>
>>Monica L. Romiti
>>NCBI User Services
>>
>>=======================================================================
>>Attention: The information contained in this message and/or attachments
>>from AgResearch Limited is intended only for the persons or entities
>>to which it is addressed and may contain confidential and/or privileged
>>material. Any review, retransmission, dissemination or other use of, or
>>taking of any action in reliance upon, this information by persons or
>>entities other than the intended recipients is prohibited by AgResearch
>>Limited. If you have received this message in error, please notify the
>>sender immediately.
>>=======================================================================
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at portal.open-bio.org
>>http://portal.open-bio.org/mailman/listinfo/bioperl-l
--
------------------------------------------------------------------
Andrew Walsh, M.Sc.
Bioinformatics Software Engineer
IT Unit
Cenix BioScience GmbH
Pfotenhauerstr. 108
01307 Dresden, Germany
Tel. +49(351)210-2699
Fax +49(351)210-1309
public key: http://www.cenix-bioscience.com/public_keys/walsh.gpg
------------------------------------------------------------------
More information about the Bioperl-l
mailing list