<div dir="ltr"><div><div><div>Thanks for the clear description, Jose. My main fear was that some structures might combine several names under the same id, giving a many:many relationship that would require some group-level storage. This seems not to be the case, at least for wwPDB structures. For instance, waters do not get unified into a single id, but remain as separate chains based on their nearest polymer. I guess one could construct a valid cif file that would disobey this rule, but in that case they would just have to deal with a few incorrect names.<br></div><div><br></div>I do think that the current selection syntax should stay backwards compatible. Even RCSB is displaying chain names rather than switching to ids. I agree that pulling ligands by proximity is a bit confusing, but this is the most consistent way I can see to deal with them if we want to support ligands with residue selections. With this approach, "1QTY.X" refers to the polymer components with name X (mapping to a unique Chain object), plus any ligands that form contacts to that polymer. I would lean towards dropping waters from substructures completely (otherwise many waters will be included with multiple polymers), although in principal they could be treated the same way as ligands.<br><br></div>I think we're going to define or reuse a more powerful selection language soon, so we can break backwards compatibility at that point.<br><br></div>-S<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Oct 3, 2016 at 9:45 PM, Jose Duarte <span dir="ltr"><<a href="mailto:jose.duarte@rcsb.org" target="_blank">jose.duarte@rcsb.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Hi Spencer</div><div><br></div>Some answers below<br><div class="gmail_extra"><br><div class="gmail_quote"><span class=""><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div><div><div><div>I have questions about the new chain id/name system post-#469.<br><br></div>- Are all Chains  guaranteed to have an ID defined? What about a name?<br></div></div></div></div></div></div></blockquote><div><br></div></span><div>Yes, they should always have an id and a name.</div><span class=""><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div><div><div></div>- Are IDs set for structures loaded from PDB?<br></div></div></div></div></div></blockquote><div><br></div></span><div>Yes. The ids don't exist in PDB files but the parser goes through the file assigning ids with the same rules and conventions as the mmCIF files use (unique ids for every distinct polymer/non-polymer molecule in the AU). The results will not always be perfect, e.g. for files without TER records the separation of non-polymer from polymer chains won't work properly, and then no distinct id would be assigned to them.</div><span class=""><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div><div></div><div>- Are IDs guaranteed to be unique within a Structure?<br></div></div></div></div></div></blockquote><div><br></div></span><div>Yes, they are unique within the asymmetric unit (which is what a Structure is). If the Structure represents a bioassembly then they are still unique because the symmetry mates get new ids: <original chain id>_<operator id></div><span class=""><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div><div></div>- Are all groups within a Chain object guaranteed to have the same ID?<br></div></div></div></div></blockquote><div><br></div></span><div>Yes, that's exactly the new definition of Chain in biojava 5. All groups within a chain should have the same id (and also the same name).</div><div><br></div><div>Chain <-> id is a 1:1 relationship, whilst Chain <-> name is a many:1 relationship. </div><span class=""><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div></div>- How are cases where the id and name differ mapped to chain objects? Is a new chain object created for every tuple (id,name) that has groups defined in the file?<br></div></div></div></blockquote><div><br></div></span><div>The id is the primary key of the Chains within a Structure as explained above. Name is only a secondary identifier that may or may not coincide with the id.</div><span class=""><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div></div>- Should the chain selection syntax (e.g. "4hhb.A") refer to ID or name? Should it be specifically restricted to polymer chains (with ligands automatically added from all chains based on proximity)?</div></div></blockquote><div><br></div></span><div>This is something that can be discussed. The 2 possibilities:</div><div><br></div><div>1) It refers to name: most backwards compatible. A selection would then pull both polymers and the non-polymers (ligands) associated to them, as annotated in the file. In this option, adding ligands based on proximity would be confusing in my opinion. </div><div><br></div><div>2) It refers to id: not backwards compatible, but less ambiguous. A selection then refers strictly to a single molecule (be it polymer or non-polymer). We could then have an extra switch in the syntax to also pull non-polymers by proximity. Pulling by proximity will not always result in the same selection as what's annotated in the file by using names (e.g. you might pull symmetry mates from next cell too). A disadvantage of this option is that some other databases (e.g. SCOP) use names and not ids, thus we'd need to convert between them.</div><span class="HOEnZb"><font color="#888888"><div><br></div><div>Jose<br></div><div><br></div></font></span></div></div></div>

</blockquote></div><br></div>