<div dir="ltr"><div><div>The sequence of the protein construct used for the structure (which may or may not match the uniprot sequence) is stored in the SEQRES records of the PDB file. You should be able to parse them using a <a href="http://biopython.org/DIST/docs/api/Bio.SeqIO.PdbIO-module.html">PdbSeqresIterator</a>.<br><br></div>Hopefully that helps.<br></div>-Spencer<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Sun, Oct 26, 2014 at 9:44 PM, João Rodrigues <span dir="ltr">&lt;<a href="mailto:anaryin@gmail.com" target="_blank">anaryin@gmail.com</a>&gt;</span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi Sanjeev,<div><br></div><div>Check breaks. As I told you, iterate over the amino acids and for each consecutive pair (e.g. residue 1 and 2), check the distance between the &quot;C&quot; atom of 1 and the &quot;N&quot; atoms of 2. This is a very well defined distance (peptide bond). Alternatively, and more simply, check CA-CA distances (e.g. &gt;4Å usually means gap).</div><div><br></div><div>Sometimes there is no chain identifier attributed to a particular chain..  check those PDBs for the column 22 of ATOM records.</div><div><br></div><div>Cheers,</div><div><br></div><div>João</div><div><br></div><div><br></div><div><br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">2014-10-26 11:31 GMT-05:00 Sanjeev Sariya <span dir="ltr">&lt;<a href="mailto:s.sariya_work@ymail.com" target="_blank">s.sariya_work@ymail.com</a>&gt;</span>:<div><div class="h5"><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div style="color:#000;background-color:#fff;font-family:HelveticaNeue-Light,Helvetica Neue Light,Helvetica Neue,Helvetica,Arial,Lucida Grande,sans-serif;font-size:16px"><div><br></div><div dir="ltr">Hi Joao,</div><div dir="ltr">Thank you for response.</div><div dir="ltr">If all residues aren&#39;t resolved in crystal, then extracting sequence from pdb, wouldn&#39;t be a good call.</div><div dir="ltr"><br> </div><div><div dir="ltr">I will be working a lot [~100s or 1000s] in near future. Is there any way, I can find break in my pdb file?<br></div><div dir="ltr"><br></div><div dir="ltr">- Another doubt, I&#39;ve, while printing the chain.ids in script. Many times, I get  chain &quot; &quot;, that is a space. </div><div dir="ltr">In script sent, code looks like:</div><div dir="ltr"><br></div><div dir="ltr">        st=PDBParser(QUIET=True).get_structure(&#39;X&#39;,i)<br>        ko=st.get_chains()<br>        for i in ko:<br>            print <a href="http://i.id" target="_blank">i.id</a> </div><div dir="ltr"><br></div><div dir="ltr">Why space name is present? <br></div><div dir="ltr"><br></div><div dir="ltr">Thanks.<br></div><br></div><div><div><div style="display:block"> <div style="font-family:HelveticaNeue-Light,Helvetica Neue Light,Helvetica Neue,Helvetica,Arial,Lucida Grande,sans-serif;font-size:16px"> <div style="font-family:HelveticaNeue,Helvetica Neue,Helvetica,Arial,Lucida Grande,sans-serif;font-size:16px"> <div dir="ltr"> <font face="Arial"> On Saturday, October 25, 2014 12:32 AM, João Rodrigues &lt;<a href="mailto:anaryin@gmail.com" target="_blank">anaryin@gmail.com</a>&gt; wrote:<br> </font> </div>  <br><br> <div><div><div><div dir="ltr">Hi there,<div><br clear="none"></div><div>The numbering in your PDB file is not continuous and it matches to regions in the structure that are missing residues. Open your PDB structure in Pymol and you&#39;ll see. Alternatively, print the C-N distances (peptide bond) for consecutive residues and you&#39;ll also notice when they are larger than ~3Å it corresponds to your break. <br clear="none"></div><div><br clear="none"></div><div>As for your discrepancy between the sequences in the FASTA file and the PDB, that&#39;s just because not all residues are resolved in the crystal structure.</div><div><br clear="none"></div><div>Cheers,</div><div><br clear="none"></div><div>João</div></div><div><br clear="none"><div>2014-10-24 13:10 GMT-05:00 Sanjeev Sariya <span dir="ltr">&lt;<a rel="nofollow" shape="rect" href="mailto:s.sariya_work@ymail.com" target="_blank">s.sariya_work@ymail.com</a>&gt;</span>:<br clear="none"><blockquote style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div><div><div style="color:#000;background-color:#fff;font-family:HelveticaNeue-Light,Helvetica Neue Light,Helvetica Neue,Helvetica,Arial,Lucida Grande,sans-serif;font-size:16px"><div dir="ltr">Hi All,</div><div dir="ltr">I&#39;m having a hard time using and understanding biopython pdb.</div><div dir="ltr">./read_pdb_file.py 3OE6.pdb</div><div dir="ltr"><br clear="none"></div><div dir="ltr">I&#39;m attaching python script, pdb file, fasta file and output with mail.</div><div dir="ltr">I&#39;have following doubts:</div><div dir="ltr">- When I print the sequence I get in broken pieces. Why?</div><div dir="ltr">- Also the sequence printed doesn&#39;t match with the fasta file (attached).</div><div dir="ltr">- Am I doing making a silly mistake?</div><div dir="ltr"><br clear="none"></div><div dir="ltr">I am running script as:<br clear="none"></div><div dir="ltr">python read_pdb_file.py 3OE6.pdb </div><div dir="ltr"><br clear="none"></div><div dir="ltr">Kindly help and guide.<br clear="none"></div><div dir="ltr"><br clear="none"></div></div></div></div><br clear="none">_______________________________________________<br clear="none">

Biopython mailing list  -  <a rel="nofollow" shape="rect" href="mailto:Biopython@mailman.open-bio.org" target="_blank">Biopython@mailman.open-bio.org</a><br clear="none">

<a rel="nofollow" shape="rect" href="http://mailman.open-bio.org/mailman/listinfo/biopython" target="_blank">http://mailman.open-bio.org/mailman/listinfo/biopython</a><br clear="none"></blockquote></div><br clear="none"></div></div></div><br><br></div>  </div> </div>  </div> </div></div></div></div></blockquote></div></div></div><br></div>

<br>_______________________________________________<br>

Biopython mailing list  -  <a href="mailto:Biopython@mailman.open-bio.org">Biopython@mailman.open-bio.org</a><br>

<a href="http://mailman.open-bio.org/mailman/listinfo/biopython" target="_blank">http://mailman.open-bio.org/mailman/listinfo/biopython</a><br></blockquote></div><br></div>