[Bioperl-l] One protein accession number that consistently fails to return history

Fields, Christopher J cjfields at illinois.edu
Tue Mar 10 19:13:42 UTC 2020


I've seen errors like this before, it's an issue on NCBI's end but to be honest I don't recall precisely why this pops up (the error msg isn't helpful at all).  I vaguely recall it possibly involving accessions that had been hidden or removed.  The 'no history' error when you request the single ID does suggest that one accession is an issue but if it's popping up in the live view I would contact NCBI to check (feel free to cc me, it would be good to know why this happens in case there is a simple way to check and mitigate it).

chris

On 3/9/20, 3:57 PM, "Bioperl-l on behalf of Warren Gallin" <bioperl-l-bounces+cjfields=illinois.edu at mailman.open-bio.org on behalf of wgallin at ualberta.ca> wrote:

    Hi,
    
    I am running an analysis that includes downloading a number of protein sequences from the NCBI site using accession numbers as unique IDs.
    
    One group of 100 Accession numbers consistently fails with an error stack:
    Request is: 
    POST https://eutils.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi
    Content-Type: application/x-www-form-urlencoded
    
    db=protein&retmode=xml&id=XP_006234735.1%2CXP_005895662.1%2CXP_010145026.1%2CXP_026783080.1%2CXP_029975310.1%2CXP_013873281.1%2CXP_010830095.1%2CTNM89077.1%2CXP_008849421.1%2CXP_021011391.1%2CKAB0346502.1%2CETE72242.1%2CXP_029574116.1%2CXP_027792425.1%2CXP_024286510.1%2CXP_006119923.1%2CXP_014760001.1%2CXP_015197542.1%2CXP_021518473.1%2CXP_008563541.1%2CXP_016948072.1%2CXP_017121508.1%2CXP_006091666.1%2CXP_009001698.1%2CKAB5584094.1%2CXP_028622921.1%2CXP_027418577.1%2CXP_008147166.1%2CKAF0876842.1%2CXP_021449955.1%2CXP_017851533.1%2CXP_004851644.1%2CXP_028652333.1%2CXP_030632496.1%2CXP_028584614.1%2CXP_006116720.1%2CXP_020333636.2%2CXP_018104651.1%2CXP_020741640.1%2CXP_023689818.1%2CXP_015025644.1%2CXP_022363001.1%2CXP_027835381.1%2CXP_016948074.1%2CXP_008941239.1%2CXP_027278106.1%2CPIO40425.1%2CXP_004755955.1%2CXP_004668743.1%2CXP_015233892.1%2CXP_005987632.1%2CXP_021540385.1%2CXP_023175958.1%2CXP_015046452.1%2CXP_017152450.1%2CXP_007063592.2%2CXP_004912857.1%2CXP_019506243.1%2CXP_005401816.1%2CXP_026560233.1%2CXP_016986869.1%2CXP_006908055.1%2CXP_018100229.1%2CXP_016948069.1%2CMXQ92247.1%2CXP_018615666.1%2CXP_004644167.1%2CXP_006754006.1%2CXP_005174156.1%2CXP_014340134.1%2CXP_026848258.1%2CXP_015194130.1%2CXP_017851532.1%2CXP_017152426.1%2CXP_029812907.1%2CXP_026838056.1%2CXP_015025638.1%2CXP_010282051.1%2CXP_011177387.1%2CXP_016159580.1%2CNP_001280068.1%2CGCF51449.1%2CXP_023037102.1%2CXP_007432802.1%2CGCC28228.1%2CXP_004660901.1%2CXP_023175954.1%2CXP_010638736.1%2CXP_010143204.1%2CXP_017871568.1%2CXP_017152441.1%2CXP_030585051.1%2CXP_029990259.1%2CXP_016986865.1%2CXP_016010343.1%2CXP_013842729.2%2CXP_013864110.1%2CXP_017011119.1%2CXP_021049528.1%2CXP_005286920.1&tool=BioPerl&email=wgallin%40ualberta.ca
    
    ------------- EXCEPTION: Bio::Root::Exception -------------
    MSG: NCBI epost error: Some IDs have invalid value and were omitted. Maximum ID value 18446744073709551615
    STACK: Error::throw
    STACK: Bio::Root::Root::throw /usr/local/share/perl/5.26.1/Bio/Root/Root.pm:449
    STACK: Bio::Root::RootI::warn /usr/local/share/perl/5.26.1/Bio/Root/RootI.pm:155
    STACK: Bio::Tools::EUtilities::parse_data /usr/local/share/perl/5.26.1/Bio/Tools/EUtilities.pm:149
    STACK: Bio::Tools::EUtilities::next_History /usr/local/share/perl/5.26.1/Bio/Tools/EUtilities.pm:319
    STACK: Bio::DB::EUtilities::next_History /usr/local/share/perl/5.26.1/Bio/DB/EUtilities.pm:164
    STACK: NCBI_Retrieval::eutilities_getData /virtual_machines/200224_VKCDB_Updating/NCBI_Retrieval.pm:246
    STACK: 200308_Main_Create.pl:143
    —————————————————————————————
    
    When I break this set of 100 accession numbers into single requests, one request consistently fails to return a history, without an error stack:
    
    Request is: 
    POST https://eutils.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi
    Content-Type: application/x-www-form-urlencoded
    
    db=protein&retmode=xml&id=MXQ92247.1&tool=BioPerl&email=wgallin%40ualberta.ca
    No history data returned at /virtual_machines/200224_VKCDB_Updating/NCBI_Retrieval.pm line 246.
    
    
    As far as I can tell, MXQ92247.1 is a real accession number, it pulls up an entry on the web interface.
    
    So a couple questions:
    
    1) Any idea why this particular accession number appears to fail using the Entrez API?
    2) Why is a multiple-accession returning an error stack while the single request just says no history returned.
    
    To me this looks like some weirdness on the NCBI side, but I thought it best to check with the BioPerl experts to see if this is a known/fixable issue before I take it to the NCBI folks.
    
    Any ideas/suggestions appreciated.
    
    Warren Gallin
    _______________________________________________
    Bioperl-l mailing list
    Bioperl-l at mailman.open-bio.org
    https://mailman.open-bio.org/mailman/listinfo/bioperl-l




More information about the Bioperl-l mailing list