<div dir="ltr">I've replied to some of this in issue 353 (<a href="https://github.com/biojava/biojava/issues/353" target="_blank">https://github.com/biojava/biojava/issues/353</a>). I think it'd be better to discuss things over there.<div><br></div><div>Jose<br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Nov 23, 2015 at 4:08 PM, Steve Darnell <span dir="ltr"><<a href="mailto:darnells@dnastar.com" target="_blank">darnells@dnastar.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Greetings,<br>
<br>
My company is preparing to submit a PR for Issue 353, "mmCIF parsing support for missing SEQRES information." The PR passes existing integration tests where an empty SEQRES component list is expected when FileParsingParameters.headerOnly = True.<br>
<br>
I suggest that SEQRES (PDB format) and _pdbx_poly_seq_scheme (mmCIF format) should be considered part of the header, which would allow a user to extract the chain sequences from a file without requiring the full, heavy weight parsing of the atom coordinate records. This is a valuable computational saving for people who are data mining information from header records across the PDB. Examples include creating custom sequence collections for compiling PDB-based BLAST databases, quickly converting local PDB/mmCIF structure files to sequences for calculating multiple sequence alignment, among others.<br>
<br>
I am asking the BioJava community for their thoughts to these questions:<br>
<br>
1. Is it acceptable to elevate this sequence information to "the header?"<br>
2. If so, is it acceptable to include this feature as part of Issue 353?<br>
3. If not, is it acceptable to create a new FileParsingParameter (e.g. "setParseSeqRes") to allow extracting the sequence information without the atomic coordinates?<br>
<br>
Best regards,<br>
Steve<br>
<br>
--<br>
Steve Darnell<br>
DNASTAR, Inc.<br>
Madison, WI USA<br>
<br>
_______________________________________________<br>
biojava-dev mailing list<br>
<a href="mailto:biojava-dev@mailman.open-bio.org">biojava-dev@mailman.open-bio.org</a><br>
<a href="http://mailman.open-bio.org/mailman/listinfo/biojava-dev" rel="noreferrer" target="_blank">http://mailman.open-bio.org/mailman/listinfo/biojava-dev</a><br>
</blockquote></div><br></div>