[Bioperl-l] Protein Records without Sequence

Warren Gallin wgallin at ualberta.ca
Wed Jun 5 18:16:57 UTC 2013


Hi,

I am encountering a problem with a number of protein records.

A HMMer search of the nr database returns a gi number and an associated sequence.

When I use that gi number to try to retrieve the full GENBANK record, however, there is no sequence returned with the record.

When I use the NCBI web interface and use that gi number the GENPEPT record returns with no sequence, but when I select fast format the sequence is returned.

Examples of gi numbers for which this occurs are:

23099847
21224301
68536697
46580017
77359109

Is this a flaw with the individual GENPEPT records?  In which case should I report it to NCBI?

Or are these some kind of "special" record that needs different parameters passed on the utilizes search?

There is a workaround, I guess, where is the sequence comes back empty then a new retrieval of fasta formatted records can be run and the empty field in the GENPEPT record repopulated, but this seems inelegant.

All advice and/or commentary appreciated.

Warren Gallin



More information about the Bioperl-l mailing list