[Bioperl-l] One protein accession number that consistently fails to return history
Fields, Christopher J
cjfields at illinois.edu
Tue Mar 10 19:13:42 UTC 2020
I've seen errors like this before, it's an issue on NCBI's end but to be honest I don't recall precisely why this pops up (the error msg isn't helpful at all). I vaguely recall it possibly involving accessions that had been hidden or removed. The 'no history' error when you request the single ID does suggest that one accession is an issue but if it's popping up in the live view I would contact NCBI to check (feel free to cc me, it would be good to know why this happens in case there is a simple way to check and mitigate it).
chris
On 3/9/20, 3:57 PM, "Bioperl-l on behalf of Warren Gallin" <bioperl-l-bounces+cjfields=illinois.edu at mailman.open-bio.org on behalf of wgallin at ualberta.ca> wrote:
Hi,
I am running an analysis that includes downloading a number of protein sequences from the NCBI site using accession numbers as unique IDs.
One group of 100 Accession numbers consistently fails with an error stack:
Request is:
POST https://eutils.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi
Content-Type: application/x-www-form-urlencoded
db=protein&retmode=xml&id=XP_006234735.1%2CXP_005895662.1%2CXP_010145026.1%2CXP_026783080.1%2CXP_029975310.1%2CXP_013873281.1%2CXP_010830095.1%2CTNM89077.1%2CXP_008849421.1%2CXP_021011391.1%2CKAB0346502.1%2CETE72242.1%2CXP_029574116.1%2CXP_027792425.1%2CXP_024286510.1%2CXP_006119923.1%2CXP_014760001.1%2CXP_015197542.1%2CXP_021518473.1%2CXP_008563541.1%2CXP_016948072.1%2CXP_017121508.1%2CXP_006091666.1%2CXP_009001698.1%2CKAB5584094.1%2CXP_028622921.1%2CXP_027418577.1%2CXP_008147166.1%2CKAF0876842.1%2CXP_021449955.1%2CXP_017851533.1%2CXP_004851644.1%2CXP_028652333.1%2CXP_030632496.1%2CXP_028584614.1%2CXP_006116720.1%2CXP_020333636.2%2CXP_018104651.1%2CXP_020741640.1%2CXP_023689818.1%2CXP_015025644.1%2CXP_022363001.1%2CXP_027835381.1%2CXP_016948074.1%2CXP_008941239.1%2CXP_027278106.1%2CPIO40425.1%2CXP_004755955.1%2CXP_004668743.1%2CXP_015233892.1%2CXP_005987632.1%2CXP_021540385.1%2CXP_023175958.1%2CXP_015046452.1%2CXP_017152450.1%2CXP_007063592.2%2CXP_004912857.1%2CXP_019506243.1%2CXP_005401816.1%2CXP_026560233.1%2CXP_016986869.1%2CXP_006908055.1%2CXP_018100229.1%2CXP_016948069.1%2CMXQ92247.1%2CXP_018615666.1%2CXP_004644167.1%2CXP_006754006.1%2CXP_005174156.1%2CXP_014340134.1%2CXP_026848258.1%2CXP_015194130.1%2CXP_017851532.1%2CXP_017152426.1%2CXP_029812907.1%2CXP_026838056.1%2CXP_015025638.1%2CXP_010282051.1%2CXP_011177387.1%2CXP_016159580.1%2CNP_001280068.1%2CGCF51449.1%2CXP_023037102.1%2CXP_007432802.1%2CGCC28228.1%2CXP_004660901.1%2CXP_023175954.1%2CXP_010638736.1%2CXP_010143204.1%2CXP_017871568.1%2CXP_017152441.1%2CXP_030585051.1%2CXP_029990259.1%2CXP_016986865.1%2CXP_016010343.1%2CXP_013842729.2%2CXP_013864110.1%2CXP_017011119.1%2CXP_021049528.1%2CXP_005286920.1&tool=BioPerl&email=wgallin%40ualberta.ca
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: NCBI epost error: Some IDs have invalid value and were omitted. Maximum ID value 18446744073709551615
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/local/share/perl/5.26.1/Bio/Root/Root.pm:449
STACK: Bio::Root::RootI::warn /usr/local/share/perl/5.26.1/Bio/Root/RootI.pm:155
STACK: Bio::Tools::EUtilities::parse_data /usr/local/share/perl/5.26.1/Bio/Tools/EUtilities.pm:149
STACK: Bio::Tools::EUtilities::next_History /usr/local/share/perl/5.26.1/Bio/Tools/EUtilities.pm:319
STACK: Bio::DB::EUtilities::next_History /usr/local/share/perl/5.26.1/Bio/DB/EUtilities.pm:164
STACK: NCBI_Retrieval::eutilities_getData /virtual_machines/200224_VKCDB_Updating/NCBI_Retrieval.pm:246
STACK: 200308_Main_Create.pl:143
—————————————————————————————
When I break this set of 100 accession numbers into single requests, one request consistently fails to return a history, without an error stack:
Request is:
POST https://eutils.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi
Content-Type: application/x-www-form-urlencoded
db=protein&retmode=xml&id=MXQ92247.1&tool=BioPerl&email=wgallin%40ualberta.ca
No history data returned at /virtual_machines/200224_VKCDB_Updating/NCBI_Retrieval.pm line 246.
As far as I can tell, MXQ92247.1 is a real accession number, it pulls up an entry on the web interface.
So a couple questions:
1) Any idea why this particular accession number appears to fail using the Entrez API?
2) Why is a multiple-accession returning an error stack while the single request just says no history returned.
To me this looks like some weirdness on the NCBI side, but I thought it best to check with the BioPerl experts to see if this is a known/fixable issue before I take it to the NCBI folks.
Any ideas/suggestions appreciated.
Warren Gallin
_______________________________________________
Bioperl-l mailing list
Bioperl-l at mailman.open-bio.org
https://mailman.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list