<div dir="ltr">Hi John,<br class="" id="yiv9797058960yui_3_16_0_1_1442583793358_11825">I

 don&#39;t know if this will help but I recently had a list of proteins for 

which I wanted the mRNA or CDS for each one so that I could use the RNA. 

(`mRNA` meaning someone entered a specific corresponding Genbank entry 

described as the mRNA and CDS meaning extracted from the `coded_by` 

information.) I found some of the same issues you seem to be describing 

and worked out getting around them, I think. The program tries more 

agressive and inefficient means as it gets to the tougher and tougher 

ones to extract. I tried to make it so it doesn&#39;t give up. It probably 

isn&#39;t perfect yet but at the time it would easily get several hundred 

starting from the NCBI-sourced fasta sequences for the protein. (The 

sequence itself isn&#39;t important but the description line actually is. It

 extracts an id from there.) It even validates them to make sure they 

encode the original protein using the correct one of the 24 genetic 

codes.<br class="" id="yiv9797058960yui_3_16_0_1_1442583793358_11827"><br class="" id="yiv9797058960yui_3_16_0_1_1442583793358_11829">You

 can check the code out at 

<a href="https://github.com/fomightez/sequencework/blob/master/RetrieveSeq/GetmRNAorCDSforProtein.py">https://github.com/fomightez/sequencework/blob/master/RetrieveSeq/GetmRNAorCDSforProtein.py</a>.

 The description is at 

<a href="https://github.com/fomightez/sequencework/tree/master/RetrieveSeq">https://github.com/fomightez/sequencework/tree/master/RetrieveSeq</a> . <br class="" id="yiv9797058960yui_3_16_0_1_1442583793358_11831"><br class="" id="yiv9797058960yui_3_16_0_1_1442583793358_11833">Feel

 free to adapt it or let me know if you&#39;d like some help testing it with

 your data or my help in maybe trying to get adapt it to what you have 

as starting material.<br class="" id="yiv9797058960yui_3_16_0_1_1442583793358_11835"><br class="" id="yiv9797058960yui_3_16_0_1_1442583793358_11837">Wayne<br class="" id="yiv9797058960yui_3_16_0_1_1442583793358_11839"><br><br class="" id="yiv9797058960yui_3_16_0_1_1442583793358_11841">Date: Fri, 18 Sep 2015 14:30:05 +0000<br class="" id="yiv9797058960yui_3_16_0_1_1442583793358_11843">From: &quot;Athey, John *&quot; &lt;<a href="mailto:John.Athey@fda.hhs.gov">John.Athey@fda.hhs.gov</a>&gt;<br class="" id="yiv9797058960yui_3_16_0_1_1442583793358_11845">To: &quot;<a href="mailto:biopython@mailman.open-bio.org">biopython@mailman.open-bio.org</a>&quot; &lt;<a href="mailto:biopython@mailman.open-bio.org">biopython@mailman.open-bio.org</a>&gt;<br class="" id="yiv9797058960yui_3_16_0_1_1442583793358_11847">Subject: [Biopython] Handling records referencing other records<br class="" id="yiv9797058960yui_3_16_0_1_1442583793358_11849">Message-ID:<br class="" id="yiv9797058960yui_3_16_0_1_1442583793358_11851">    &lt;<a href="mailto:5D5BA0385615F148A9D2FD86BB656F700FEAF9F8@FDSWV09433.fda.gov">5D5BA0385615F148A9D2FD86BB656F700FEAF9F8@FDSWV09433.fda.gov</a>&gt;<br class="" id="yiv9797058960yui_3_16_0_1_1442583793358_11853">Content-Type: text/plain; charset=&quot;us-ascii&quot;<br class="" id="yiv9797058960yui_3_16_0_1_1442583793358_11855"><br class="" id="yiv9797058960yui_3_16_0_1_1442583793358_11857">Hello all,<br class="" id="yiv9797058960yui_3_16_0_1_1442583793358_11859"><br class="" id="yiv9797058960yui_3_16_0_1_1442583793358_11861">I&#39;m

 looking for advice on how to handle Genbank records that reference 

other records as part of their location. My program iterates through 

large Genbank-formatted files with SeqIO.parse and extracts the CDS for 

subsequent analysis, using feat.extract(). However, upon hitting a 

record where the feature location references another record, it 

SOMETIMES fails. For example, 

<a href="http://www.ncbi.nlm.nih.gov/nuccore/DQ100169">http://www.ncbi.nlm.nih.gov/nuccore/DQ100169</a> seems to be handled 

correctly, while <a href="http://www.ncbi.nlm.nih.gov/nuccore/DQ100170">http://www.ncbi.nlm.nih.gov/nuccore/DQ100170</a> gives a 

&quot;ValueError: Feature references another sequence.&quot; Curiously, in both 

cases the CDS feature itself doesn&#39;t specify another record, only the 

parent gene does.<br class="" id="yiv9797058960yui_3_16_0_1_1442583793358_11863"><br class="" id="yiv9797058960yui_3_16_0_1_1442583793358_11865">My questions about this are:<br class="" id="yiv9797058960yui_3_16_0_1_1442583793358_11867"><br class="" id="yiv9797058960yui_3_16_0_1_1442583793358_11869">1)      Why does the extraction fail on some records but not on all of them?<br class="" id="yiv9797058960yui_3_16_0_1_1442583793358_11871"><br class="" id="yiv9797058960yui_3_16_0_1_1442583793358_11873">2)      Is there a way to extract the data I&#39;m looking for without causing this error?<br class="" id="yiv9797058960yui_3_16_0_1_1442583793358_11875"><br class="" id="yiv9797058960yui_3_16_0_1_1442583793358_11877">3)     

 If the answer to (2) is no, is there some other way to check whether 

the sequence will cause this error, skip extracting that sequence, and 

exclude that record from the analysis?<br class="" id="yiv9797058960yui_3_16_0_1_1442583793358_11879"><br class="" id="yiv9797058960yui_3_16_0_1_1442583793358_11881">Thanks for any help you can provide!<br></div>