[Bioperl-l] One protein accession number that consistently fails to return history

Warren Gallin wgallin at ualberta.ca
Mon Mar 9 20:50:48 UTC 2020


Hi,

I am running an analysis that includes downloading a number of protein sequences from the NCBI site using accession numbers as unique IDs.

One group of 100 Accession numbers consistently fails with an error stack:
Request is: 
POST https://eutils.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi
Content-Type: application/x-www-form-urlencoded

db=protein&retmode=xml&id=XP_006234735.1%2CXP_005895662.1%2CXP_010145026.1%2CXP_026783080.1%2CXP_029975310.1%2CXP_013873281.1%2CXP_010830095.1%2CTNM89077.1%2CXP_008849421.1%2CXP_021011391.1%2CKAB0346502.1%2CETE72242.1%2CXP_029574116.1%2CXP_027792425.1%2CXP_024286510.1%2CXP_006119923.1%2CXP_014760001.1%2CXP_015197542.1%2CXP_021518473.1%2CXP_008563541.1%2CXP_016948072.1%2CXP_017121508.1%2CXP_006091666.1%2CXP_009001698.1%2CKAB5584094.1%2CXP_028622921.1%2CXP_027418577.1%2CXP_008147166.1%2CKAF0876842.1%2CXP_021449955.1%2CXP_017851533.1%2CXP_004851644.1%2CXP_028652333.1%2CXP_030632496.1%2CXP_028584614.1%2CXP_006116720.1%2CXP_020333636.2%2CXP_018104651.1%2CXP_020741640.1%2CXP_023689818.1%2CXP_015025644.1%2CXP_022363001.1%2CXP_027835381.1%2CXP_016948074.1%2CXP_008941239.1%2CXP_027278106.1%2CPIO40425.1%2CXP_004755955.1%2CXP_004668743.1%2CXP_015233892.1%2CXP_005987632.1%2CXP_021540385.1%2CXP_023175958.1%2CXP_015046452.1%2CXP_017152450.1%2CXP_007063592.2%2CXP_004912857.1%2CXP_019506243.1%2CXP_005401816.1%2CXP_026560233.1%2CXP_016986869.1%2CXP_006908055.1%2CXP_018100229.1%2CXP_016948069.1%2CMXQ92247.1%2CXP_018615666.1%2CXP_004644167.1%2CXP_006754006.1%2CXP_005174156.1%2CXP_014340134.1%2CXP_026848258.1%2CXP_015194130.1%2CXP_017851532.1%2CXP_017152426.1%2CXP_029812907.1%2CXP_026838056.1%2CXP_015025638.1%2CXP_010282051.1%2CXP_011177387.1%2CXP_016159580.1%2CNP_001280068.1%2CGCF51449.1%2CXP_023037102.1%2CXP_007432802.1%2CGCC28228.1%2CXP_004660901.1%2CXP_023175954.1%2CXP_010638736.1%2CXP_010143204.1%2CXP_017871568.1%2CXP_017152441.1%2CXP_030585051.1%2CXP_029990259.1%2CXP_016986865.1%2CXP_016010343.1%2CXP_013842729.2%2CXP_013864110.1%2CXP_017011119.1%2CXP_021049528.1%2CXP_005286920.1&tool=BioPerl&email=wgallin%40ualberta.ca

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: NCBI epost error: Some IDs have invalid value and were omitted. Maximum ID value 18446744073709551615
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/local/share/perl/5.26.1/Bio/Root/Root.pm:449
STACK: Bio::Root::RootI::warn /usr/local/share/perl/5.26.1/Bio/Root/RootI.pm:155
STACK: Bio::Tools::EUtilities::parse_data /usr/local/share/perl/5.26.1/Bio/Tools/EUtilities.pm:149
STACK: Bio::Tools::EUtilities::next_History /usr/local/share/perl/5.26.1/Bio/Tools/EUtilities.pm:319
STACK: Bio::DB::EUtilities::next_History /usr/local/share/perl/5.26.1/Bio/DB/EUtilities.pm:164
STACK: NCBI_Retrieval::eutilities_getData /virtual_machines/200224_VKCDB_Updating/NCBI_Retrieval.pm:246
STACK: 200308_Main_Create.pl:143
—————————————————————————————

When I break this set of 100 accession numbers into single requests, one request consistently fails to return a history, without an error stack:

Request is: 
POST https://eutils.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi
Content-Type: application/x-www-form-urlencoded

db=protein&retmode=xml&id=MXQ92247.1&tool=BioPerl&email=wgallin%40ualberta.ca
No history data returned at /virtual_machines/200224_VKCDB_Updating/NCBI_Retrieval.pm line 246.


As far as I can tell, MXQ92247.1 is a real accession number, it pulls up an entry on the web interface.

So a couple questions:

1) Any idea why this particular accession number appears to fail using the Entrez API?
2) Why is a multiple-accession returning an error stack while the single request just says no history returned.

To me this looks like some weirdness on the NCBI side, but I thought it best to check with the BioPerl experts to see if this is a known/fixable issue before I take it to the NCBI folks.

Any ideas/suggestions appreciated.

Warren Gallin


More information about the Bioperl-l mailing list