<html><body><div style="color:#000; background-color:#fff; font-family:HelveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida Grande, sans-serif;font-size:16px"><div id="yiv7786995846"><div id="yui_3_16_0_1_1423751048307_4733"><div style="background-color: rgb(255, 255, 255);" id="yui_3_16_0_1_1423751048307_4732"><div dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_271326" style="color: rgb(0, 0, 0); font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 16px;"><span id="yiv7786995846yui_3_16_0_1_1423576262019_341567">Hi Paolo, Andreas,</span></div><div dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_273231" style="color: rgb(0, 0, 0); font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 16px;"><br clear="none"><span></span></div><div dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_273221" style="color: rgb(0, 0, 0); font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 16px;"><span id="yiv7786995846yui_3_16_0_1_1423576262019_277904">I am sorry if I sounded disrespectful. <br clear="none"></span></div><div dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_273223" style="color: rgb(0, 0, 0); font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 16px;"><span id="yiv7786995846yui_3_16_0_1_1423576262019_273222"><br clear="none"></span></div><div dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_273224" style="color: rgb(0, 0, 0); font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 16px;"><span id="yiv7786995846yui_3_16_0_1_1423576262019_273222">I would like to point out that a new user gets confused by what has been published about the biojava library. There are several places where you strongly indicate that concatenated sequences are the intended design (see citations below).</span></div><div dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_325090" style="color: rgb(0, 0, 0); font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 16px;"><br clear="none"></div><div class="yiv7786995846" dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_365226" style="color: rgb(0, 0, 0); font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 16px;">I
interpreted the below citations, that the library is aimed at concatenated
sequences. However the actual reading reads only one record. I thought it
to be very helpful to correct this discrepancy and submit a patch. Usually patches are reviewed by a maintainer who either accepts or rejects the pull request. And I would like to mention, that I also spent a
lot of time to understand and correct the issue.<br clear="none"></div><div class="yiv7786995846" dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_287657" style="color: rgb(0, 0, 0); font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 16px;"><br clear="none"></div><div class="yiv7786995846" dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_315739" style="color: rgb(0, 0, 0); font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 16px;">I would be glad if you find a way how to include contributions.</div><div class="yiv7786995846" dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_315741" style="color: rgb(0, 0, 0); font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 16px;"><br clear="none"></div><div class="yiv7786995846" dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_343902" style="color: rgb(0, 0, 0); font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 16px;">I would also like to mention, that there are errors during Genbank reading/writing. <span class="yiv7786995846" id="yiv7786995846yui_3_16_0_1_1423576262019_273222" style="">When I compare an original Genbank sequence to one which has been first read and then written, I can see that there are several differences between the two files. </span><span class="yiv7786995846" id="yiv7786995846yui_3_16_0_1_1423576262019_273222">The most urgent of which is that </span>the Location
start of each feature is incremented by
one for each read/write cycle. There are also some minor issues like: the version field is shortened, references and organism are dropped, keywords and source are not copied etc. So it seems you are in need of additional contributions.</div><div class="yiv7786995846" dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_365232" style="color: rgb(0, 0, 0); font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 16px;"><br clear="none"></div>citations:<br clear="none"><span id="yiv7786995846yui_3_16_0_1_1423576262019_273222"></span><div dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_273226" style="color: rgb(0, 0, 0); font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 16px;"><span id="yiv7786995846yui_3_16_0_1_1423576262019_273222"><br clear="none"></span></div><div dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_273228" style="color: rgb(0, 0, 0); font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 16px;"><span id="yiv7786995846yui_3_16_0_1_1423576262019_273222">from the cookbook:</span></div><div dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_275540" style="color: rgb(0, 0, 0); font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 16px;"><br clear="none"></div><pre class="yiv7786995846" id="yiv7786995846yui_3_16_0_1_1423576262019_275594" style="color: rgb(0, 0, 0); font-family: monospace; font-size: 16px;">        <span class="yiv7786995846" id="yiv7786995846yui_3_16_0_1_1423576262019_282815" style="color:#666666;font-style:italic;">/*
         * Method 2: With the GenbankReaderHelper
         */</span>
        <span class="yiv7786995846" id="yiv7786995846yui_3_16_0_1_1423576262019_322764" style="color:#666666;font-style:italic;">//Try with the GenbankReaderHelper</span>
        <a rel="nofollow" shape="rect" class="yiv7786995846" id="yiv7786995846yui_3_16_0_1_1423576262019_275593" style="" target="_blank" href="http://www.google.com/search?hl=en&q=allinurl%3Afile+java.sun.com&btnI=I%27m%20Feeling%20Lucky"><span class="yiv7786995846" id="yiv7786995846yui_3_16_0_1_1423576262019_275592" style="color:#003399;">File</span></a> dnaFile <span class="yiv7786995846" style="color:#339933;">=</span> <span class="yiv7786995846" id="yiv7786995846yui_3_16_0_1_1423576262019_322762" style="color:#000000;font-weight:bold;">new</span> <a rel="nofollow" shape="rect" class="yiv7786995846" style="" target="_blank" href="http://www.google.com/search?hl=en&q=allinurl%3Afile+java.sun.com&btnI=I%27m%20Feeling%20Lucky" id="yui_3_16_0_1_1423751048307_7303"><span class="yiv7786995846" style="color:#003399;" id="yui_3_16_0_1_1423751048307_7302">File</span></a><span class="yiv7786995846" style="color:#009900;">(</span><span class="yiv7786995846" id="yiv7786995846yui_3_16_0_1_1423576262019_372431" style="color:#0000ff;">"src/test/resources/NM_000266.gb"</span><span class="yiv7786995846" id="yiv7786995846yui_3_16_0_1_1423576262019_372429" style="color:#009900;">)</span><span class="yiv7786995846" style="color:#339933;">;</span>                
        <a rel="nofollow" shape="rect" class="yiv7786995846" style="" target="_blank" href="http://www.google.com/search?hl=en&q=allinurl%3Afile+java.sun.com&btnI=I%27m%20Feeling%20Lucky"><span class="yiv7786995846" style="color:#003399;">File</span></a> protFile <span class="yiv7786995846" style="color:#339933;">=</span> <span class="yiv7786995846" style="color:#000000;font-weight:bold;">new</span> <a rel="nofollow" shape="rect" class="yiv7786995846" style="" target="_blank" href="http://www.google.com/search?hl=en&q=allinurl%3Afile+java.sun.com&btnI=I%27m%20Feeling%20Lucky" id="yui_3_16_0_1_1423751048307_7301"><span class="yiv7786995846" style="color:#003399;" id="yui_3_16_0_1_1423751048307_7300">File</span></a><span class="yiv7786995846" style="color:#009900;" id="yui_3_16_0_1_1423751048307_7299">(</span><span class="yiv7786995846" id="yiv7786995846yui_3_16_0_1_1423576262019_372433" style="color:#0000ff;">"src/test/resources/BondFeature.gb"</span><span class="yiv7786995846" style="color:#009900;">)</span><span class="yiv7786995846" style="color:#339933;">;</span>
        LinkedHashMap<span class="yiv7786995846" style="color:#339933;"><</span>String, DNASequence<span class="yiv7786995846" style="color:#339933;">></span> dnaSequences <span class="yiv7786995846" style="color:#339933;">=</span> GenbankReaderHelper.<span class="yiv7786995846" style="color:#006633;">readGenbankDNASequence</span><span class="yiv7786995846" style="color:#009900;">(</span> dnaFile <span class="yiv7786995846" style="color:#009900;">)</span><span class="yiv7786995846" style="color:#339933;">;</span>
        <span class="yiv7786995846" id="yiv7786995846yui_3_16_0_1_1423576262019_370070" style="color:#000000;font-weight:bold;">for</span> <span class="yiv7786995846" id="yiv7786995846yui_3_16_0_1_1423576262019_372435" style="color:#009900;">(</span>DNASequence sequence <span class="yiv7786995846" style="color:#339933;">:</span> dnaSequences.<span class="yiv7786995846" style="color:#006633;">values</span><span class="yiv7786995846" style="color:#009900;">(</span><span class="yiv7786995846" style="color:#009900;">)</span><span class="yiv7786995846" style="color:#009900;">)</span> <span class="yiv7786995846" style="color:#009900;">{</span>
                 <a rel="nofollow" shape="rect" class="yiv7786995846" style="" target="_blank" href="http://www.google.com/search?hl=en&q=allinurl%3Asystem+java.sun.com&btnI=I%27m%20Feeling%20Lucky"><span class="yiv7786995846" style="color:#003399;">System</span></a>.<span class="yiv7786995846" style="color:#006633;">out</span>.<span class="yiv7786995846" style="color:#006633;">println</span><span class="yiv7786995846" style="color:#009900;">(</span> sequence.<span class="yiv7786995846" style="color:#006633;">getSequenceAsString</span><span class="yiv7786995846" style="color:#009900;">(</span><span class="yiv7786995846" style="color:#009900;">)</span> <span class="yiv7786995846" style="color:#009900;">)</span><span class="yiv7786995846" style="color:#339933;">;</span>
        <span class="yiv7786995846" style="color:#009900;">}</span>
</pre><div dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_275595" style="color: rgb(0, 0, 0); font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 16px;"><span id="yiv7786995846yui_3_16_0_1_1423576262019_273222"> </span></div><div dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_282817" style="color: rgb(0, 0, 0); font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 16px;"><span id="yiv7786995846yui_3_16_0_1_1423576262019_273222">without knowing the contents of 'NM_000266.gb' the reader must assume, that there are several sequences in the file as first: </span><span id="yiv7786995846yui_3_16_0_1_1423576262019_273222"></span><span id="yiv7786995846yui_3_16_0_1_1423576262019_273222">The LinkedHashMap is called 'dnaSequences" with emphasis on the plural. Second if you read only one DNASequence why would you have a LinkedHashMap and why would you loop over one! sequence? Correct me if I am wrong, but in my opinion the cookbook expects concatenated sequences per single file.</span></div><div dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_282820" style="color: rgb(0, 0, 0); font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 16px;"><br clear="none"><span id="yiv7786995846yui_3_16_0_1_1423576262019_273222"></span></div><div dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_282822" style="color: rgb(0, 0, 0); font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 16px;"><span id="yiv7786995846yui_3_16_0_1_1423576262019_273222">For non concatenated sequences speaks, that the method itself is named 'readGenbankDNASequences'. So I looked into the method to gain more clarity.<br clear="none"></span></div><div dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_282824" style="color: rgb(0, 0, 0); font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 16px;"><br clear="none"></div><div dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_367688"><div dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_367688" class="" style="color: rgb(0, 0, 0); font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 16px;">from the source code of GenbankReader:</div><div dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_367688" class="" style="color: rgb(0, 0, 0); font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 16px;">/**</div><div dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_367688" class="" style="color: rgb(0, 0, 0); font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 16px;">* This method tries to parse maximum <code>max</code> records from</div><div dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_367688" class="" style="color: rgb(0, 0, 0); font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 16px;">* the open File or InputStream, and leaves the underlying resource open.<br></div><div dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_367688" class="" style="color: rgb(0, 0, 0); font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 16px;">.</div><div dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_367688" class="" style="color: rgb(0, 0, 0); font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 16px;">.</div><div dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_367688" class="" style="color: rgb(0, 0, 0); font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 16px;">.</div><div class="" style="color: rgb(0, 0, 0); font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 16px;" id="yui_3_16_0_1_1423751048307_7336"><br class="" style=""></div><div dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_367688" class="" style="">The introducing comment of the method clearly speaks of multiple records. The</div><div dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_367688" class="" style="">method is called with a parameter 'max=-1' to indicate that all records of the</div><div dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_367688" class="" style="">file should be read. Interestingly the parameter max is not mentioned again in the following code </div><div dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_367688" class="" style="">and thus not implemented. </div><div dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_367688" class="" style=""><br class="" style=""></div><div dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_367688" class="" style="">So do you not agree, that the design discussion of whether or not concatenated</div><div dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_367688" class="" style="">sequence files are expected is not decided in your library?</div><div dir="ltr" id="yiv7786995846yui_3_16_0_1_1423576262019_367688" class="" style=""><br></div><div style="color: rgb(0, 0, 0); font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 16px;" class="" id="yui_3_16_0_1_1423751048307_7361" dir="ltr">Best regards</div><div style="color: rgb(0, 0, 0); font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 16px;" class="" id="yui_3_16_0_1_1423751048307_7361" dir="ltr">Stefan</div><div style="color: rgb(0, 0, 0); font-family: HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif; font-size: 16px;" class="" id="yui_3_16_0_1_1423751048307_7361" dir="ltr"><br></div></div><div class="yiv7786995846yahoo_quoted" style="display: block;"> <div style="font-family:HelveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida Grande, sans-serif;font-size:16px;"> <div style="font-family:HelveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida Grande, sans-serif;font-size:16px;"> <div dir="ltr"> <font face="Arial" size="2"> Andreas Prlic <andreas@sdsc.edu> schrieb am 9:10 Donnerstag, 12.Februar 2015:<br clear="none"> </font> </div> <br clear="none"><br clear="none"> <div class="qtdSeparateBR"><br><br></div><div class="yiv7786995846yqt9048909126" id="yiv7786995846yqt10720"><div class="yiv7786995846y_msg_container"><div id="yiv7786995846"><div><div dir="ltr">Thanks, Paolo, I would like to second what you said. <div><br clear="none"></div><div>We don't have many rules at BioJava, but "respect the work of others" is one that we used to send out to developers in the past before we granted SVN write access. I would like to put up this rule somewhere where it can be noticed. Nobody gets paid to contribute to BioJava and we are depending on contribution made on a volunteer basis. As such it is important to show respect for the work that people contributed previously. This does not mean we can't change things, it just means we do it in a way where everybody can feel respected.</div><div><br clear="none"></div><div>Andreas<br clear="none"><div><br clear="none"><div><br clear="none"></div><div><br clear="none"></div><div class="yiv7786995846yqt2519639975" id="yiv7786995846yqt36567"><div class="yiv7786995846gmail_extra"><br clear="none"><div class="yiv7786995846gmail_quote">On Wed, Feb 11, 2015 at 3:12 AM, Paolo Pavan <span dir="ltr"><<a rel="nofollow" shape="rect" ymailto="mailto:paolo.pavan@gmail.com" target="_blank" href="mailto:paolo.pavan@gmail.com">paolo.pavan@gmail.com</a>></span> wrote:<br clear="none"><blockquote class="yiv7786995846gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div dir="ltr"><div><div><div><div><div><div><div>Hi Stefan,<br clear="none"></div>I don't want to talk about the design of the system. I know what you mean because I have already had the need to work on that and, yes, there are some choices that I would have not taken. But my choice, entering in a collaborative project, was to mantain the planned design to be sure to add some extra features without run the risk of getting lost in huge modifications, minimize the impact of modifications on the API and moreover to have some kind of "respect" for the work of people before me.<br clear="none">Unless the system works in reasonable time, in my opinion this is the important thing. <br clear="none"><br clear="none">Moreover be aware that there is also a parallel GenbankProxyLoader system that I'm pretty sure that it is not receiving your patches (catenated sequence loading). To be checked.<br clear="none"><br clear="none"></div>About section keys, unknown tags and collecting them in a list, I partially agree with you (more no than yes) because the Genbank format is highly specified and formal, please have a look at the link below. The best think would have to throw an Exception in unsupported cases but this would definitely limit the parser usage.<br clear="none"><span><a rel="nofollow" shape="rect" target="_blank" href="http://www.insdc.org/files/feature_table.html">http://www.insdc.org/files/feature_table.html</a></span><br clear="none"><br clear="none"></div>About reference, authors and dblink, as prevously mentioned, I have no exceptions to add those properties to AbstractSequence even if I feel they make poor sense outside your specific use case.<br clear="none"></div>Andreas has the last word on this, however.<br clear="none"><br clear="none"></div>My suggestion is just to carefully check your new added tags since it seems to me that <span>SOURCE_TAG is the same of your </span><span>DBSOURCE</span><span> but I know that this topic is also debated in the others bio* projects so it is not trivial.<br clear="none"><br clear="none"></span></div><span>Happy biojava-ing,<br clear="none"></span></div><span>Paolo<br clear="none"></span><div><div><span><br clear="none"><br clear="none"></span> <span></span><div><div><div><div><br clear="none"><br clear="none"></div></div></div></div></div></div></div><div class="yiv7786995846HOEnZb"><div class="yiv7786995846h5"><div class="yiv7786995846gmail_extra"><br clear="none"><div class="yiv7786995846gmail_quote">2015-02-11 0:13 GMT+01:00 stefan harjes <span dir="ltr"><<a rel="nofollow" shape="rect" ymailto="mailto:stefanharjes@yahoo.de" target="_blank" href="mailto:stefanharjes@yahoo.de">stefanharjes@yahoo.de</a>></span>:<br clear="none"><blockquote class="yiv7786995846gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div><div style="color:#000;background-color:#fff;font-family:HelveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida Grande, sans-serif;font-size:16px;"><div dir="ltr"><span>Hi Paolo,</span></div><div dir="ltr"><br clear="none"><span></span></div><div dir="ltr"><span>come on now the reader returns a LinkedHashMap. If it would only read one sequence, it could simply return the sequence. Also the actual method call contains a integer parameter indicating how many sequences should be fetched. In my pull request I patched the reader so that it actually does that.</span></div><div dir="ltr"><br clear="none"><span></span></div><div dir="ltr"><span>The sectionKey tags I was talking about are actually 'PRIMARY', DBREFERENCE' and "DBLINK'. I think it would be better not to ask after every single key, but simply collect all 'not known' keys together with their values and store them in a list. Then you could just read/write them without silently forget about them. Right now if any unknown tag is read, it is simply recognized and then dropped, which I do not consider friendly.</span></div><div dir="ltr"><br clear="none"><span></span></div><div dir="ltr"><span>Cheers</span></div><div dir="ltr"><span>Stefan</span></div><div dir="ltr"><span><br clear="none"></span></div><div dir="ltr"><br clear="none"><span></span></div><div dir="ltr"><span></span></div> <div><br clear="none"><br clear="none"></div><div style="display:block;"> <div style="font-family:HelveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida Grande, sans-serif;font-size:16px;"> <div style="font-family:HelveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida Grande, sans-serif;font-size:16px;"> <div dir="ltr"> <font face="Arial"> Paolo Pavan <<a rel="nofollow" shape="rect" ymailto="mailto:paolo.pavan@gmail.com" target="_blank" href="mailto:paolo.pavan@gmail.com">paolo.pavan@gmail.com</a>> schrieb am 23:56 Dienstag, 10.Februar 2015:<br clear="none"> </font> </div><div><div> <br clear="none"><br clear="none"> <div><div><div><div dir="ltr"><div><div><div><div><div>Hi Stefan, thank you for the review.<br clear="none"></div>You are actually surprising me since if I'm not sure that the reader parser supports multiple genbank files catenated I tought instead that all the info now are full filled in the sequence object. <br clear="none">There are just few tags that are not imported (KEYWORDS_TAG, SOURCE_TAG, REFERENCE_TAG, BASE_COUNT_TAG), the documentation says that this is because they are anyway inferrable by different fields. I can also add this is because, as in the case of authors and reference tags, there is not such a property in the AbstractSequence class and I see poor sense to have it, unless you are doing this sort of swapping job. Anyway, it could be certainly added. <br clear="none"><br clear="none"></div></div>About writer and its failure in writing the reported accession: even if I can't go in deep now, it may well be that it is failing in writing InsdcLocations (also known as split locations, for example<span> <i>join(58474..59052,59052..59279)</i></span> reported in your genome file) since they have been used more sistematically by the last updated genbank reader. It may need a quick review along with db_xref qualifiers as well. <br clear="none"><br clear="none"></div>In the end about alignment, if you are using NedlemanWunsch or such in the alignment package, be sure to load them with the proper AmbiguityDNACompoundSet.<br clear="none"><br clear="none">Cheers,<br clear="none"></div>Paolo<br clear="none"></div><div><br clear="none"><div>2015-02-10 10:28 GMT+01:00 stefan harjes <span dir="ltr"><<a rel="nofollow" shape="rect" ymailto="mailto:stefanharjes@yahoo.de" target="_blank" href="mailto:stefanharjes@yahoo.de">stefanharjes@yahoo.de</a>></span>:<br clear="none"><div><blockquote style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div><div style="color:#000;background-color:#fff;font-family:HelveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida Grande, sans-serif;font-size:16px;"><div dir="ltr"><span>Hi Paolo, biojava-dev<br clear="none"></span></div><div dir="ltr"><br clear="none"><span></span></div><div dir="ltr"><span>I had a look myself. First I noticed, that GenbankWriter was actually more sophisticated than the Reader, as it was able to write more than one sequence. I submitted a pull request to patch GenbankReader which enables reading more than one genbank sequence from one file. When we speak of full Genbank reading capability, there are still at least 5 sectionKeys which are just ignored in the reader. I think there should be a way of simply storing them in a List and not asking for each one of them, maybe I will look there later.</span></div><div dir="ltr"><br clear="none"><span></span></div><div dir="ltr"><span>The writer is doing pretty well, but you should try to write 'NC_000913.gb' which crashed it in my case (writing nothing/no exception).</span></div><div dir="ltr"><br clear="none"><span></span></div><div dir="ltr"><span>I added two more test cases, but I think in order to really test the reader/writer capabilities we need a test where several sequences/proteins are read, merge into an array and written to stream. Upon reading this stream again, we should compare if they are still identical.</span></div><div dir="ltr"><br clear="none"><span></span></div><div dir="ltr"><span>Also I noticed, that you can not compare (align) a DNA sequence with non ambiguous nucleotide to a sequence with ambiguous nucleotide compounds even though a matrix dedicated for that exact comparison exists.<br clear="none"></span></div><div dir="ltr"><br clear="none"><span></span></div><div dir="ltr"><span>Cheers</span></div><div dir="ltr"><span>Stefan<br clear="none"> </span></div><div><br clear="none"><br clear="none"></div><div style="display:block;"> <div style="font-family:HelveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida Grande, sans-serif;font-size:16px;"> <div style="font-family:HelveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida Grande, sans-serif;font-size:16px;"> <div dir="ltr"> <font face="Arial"> Paolo Pavan <<a rel="nofollow" shape="rect" ymailto="mailto:paolo.pavan@gmail.com" target="_blank" href="mailto:paolo.pavan@gmail.com">paolo.pavan@gmail.com</a>> schrieb am 4:43 Samstag, 7.Februar 2015:<br clear="none"> </font> </div><div><div> <br clear="none"><br clear="none"> <div><div><div><div dir="ltr"><div><div><div>Hi Stefan,<br clear="none"></div>I had a look at the GenbankWriter because I could also need it in the future. Can you please specify what are the issues you are meeting? Because I made few quick tests and everything seemed work to me. <br clear="none"></div><br clear="none">Just in case, if you are reading then writing a Genbank file, are you using the last release of biojava 4.0.0 version? This would explain empty genbank files in output (If I have understood correctly what you have done).<br clear="none"><br clear="none"></div>Paolo<br clear="none"></div><div><br clear="none"><div>2015-02-06 11:03 GMT+01:00 stefan harjes <span dir="ltr"><<a rel="nofollow" shape="rect" ymailto="mailto:stefanharjes@yahoo.de" target="_blank" href="mailto:stefanharjes@yahoo.de">stefanharjes@yahoo.de</a>></span>:<br clear="none"><blockquote style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div><div><div style="color:#000;background-color:#fff;font-family:HelveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida Grande, sans-serif;font-size:16px;"><div dir="ltr"><span>@Andreas: Yes I understand, thanks anyhow. <br clear="none"></span></div><div dir="ltr"><span><br clear="none"></span></div><div dir="ltr"><span>@Paolo: I will have another look at </span>GenbankWriter maybe I find some time.</div><div dir="ltr"><br clear="none"></div><div dir="ltr">Cheers</div><div dir="ltr">Stefan</div><div dir="ltr"><br clear="none"></div> <div><br clear="none"><br clear="none"></div><div style="display:block;"> <div style="font-family:HelveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida Grande, sans-serif;font-size:16px;"> <div style="font-family:HelveticaNeue, Helvetica Neue, Helvetica, Arial, Lucida Grande, sans-serif;font-size:16px;"> <div dir="ltr"> <font face="Arial"> Andreas Prlic <<a rel="nofollow" shape="rect" ymailto="mailto:andreas@sdsc.edu" target="_blank" href="mailto:andreas@sdsc.edu">andreas@sdsc.edu</a>> schrieb am 7:01 Freitag, 6.Februar 2015:<br clear="none"> </font> </div><div><div> <br clear="none"><br clear="none"> <div><div><div><div dir="ltr">Hi Stefan,<div><br clear="none"></div><div>thanks for your reply. You are trying to use the code base in a way that has not been done before. While I share your desire that this should work in principle, I think it is also important to point out that we never promised that serialization would be a supported feature. We started a thread to add better support on this here: <a rel="nofollow" shape="rect" target="_blank" href="https://github.com/biojava/biojava/issues/249">https://github.com/biojava/biojava/issues/249</a> .</div><div><br clear="none"></div><div>Regarding your project: It seems it would make sense to split your array of sequences into two: DNA sequences and protein sequences. Dealing with each of those separately might be easier. </div><div><br clear="none"></div><div>Andreas</div><div><br clear="none"><div><br clear="none"><div><div>On Wed, Feb 4, 2015 at 3:42 PM, stefan harjes <span dir="ltr"><<a rel="nofollow" shape="rect" ymailto="mailto:stefanharjes@yahoo.de" target="_blank" href="mailto:stefanharjes@yahoo.de">stefanharjes@yahoo.de</a>></span> wrote:<br clear="none"><blockquote style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex;"><div><div style="color:rgb(0,0,0);font-family:HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif;font-size:16px;background-color:rgb(255,255,255);"><div><div><div style="color:rgb(0,0,0);font-family:HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif;font-size:16px;background-color:rgb(255,255,255);"><div><div><div style="color:rgb(0,0,0);font-family:HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif;font-size:16px;background-color:rgb(255,255,255);"><div dir="ltr"><span>Hi Andreas,</span></div><div dir="ltr"><br clear="none"><span></span></div><div dir="ltr"><span>yes I took a look at </span>FastaWriterHelper as well as GenbankWriter and they only seem to implement writing the name and sequence as fasta. Also they do not allow to read/write a mixed array of protein and DNA sequences. I asked myself what is the sense of constructing a complicated sequence with annotations, features and links, if I can only write fasta? <br clear="none"></div><div dir="ltr"><br clear="none"></div><div dir="ltr">This lead me to check out why one of the most basic classes of biojava like sequence (i.e. AbstractSequence) is not serializable. <br clear="none"></div><div dir="ltr"><div dir="ltr">(Isn't it like String for java?)<br clear="none"></div><div><br clear="none"></div></div><div dir="ltr">The first thing I noticed is that for some reason every sequence has a proxyloader. As fas as I understand the proxy is implemented in order to not load the entire sequence in case it is very big. Sure, then you can load sequences which have Gigabase length. But I have never in my 25 years of biochemistry actually worked with a single sequence of > 1GB. While there are some plant chromosomes which might fit this description, I would argue that the vast majority of biological sequences are much smaller and thus do not need a proxy for a single sequence. Thus, I would conclude that a small subset of ChromosomeSequence might need a proxyreader implementation.</div><div dir="ltr">And thus it should be implemented there and not in the most basic class?</div><div dir="ltr"><br clear="none"></div><div dir="ltr"><div dir="ltr">The first class which prevents serialization is as you mentioned NucleotideCompound. I lack the biojava experience to say what is essential in NucleotideCompond and why it does not allow an empty constructor. But I saw for example in biojava 3.1 that compounds are allowed to have flexible name length, which I have never seen in actual sequence data, where it is always 1 or three characters. Is it not a better strategy to keep basic classes such as Sequence and Compound more basic in order to allow serialization. Implementation of more complex features could then be moved to classes which extend the basic classes? <br clear="none"></div><div dir="ltr"><br clear="none"></div><div dir="ltr">In my humble opinion one could instantiate a compound without a 'base' name but once this compound is added to the compound set, I could check that it actually has a base name?<br clear="none"></div></div><div dir="ltr"><div><br clear="none"></div><div dir="ltr">I do not want to sound like a know-it-all and do not try to reinvent biojava. However to be honest the (unsuccessful) effort in trying to serialize an ArrayList<Sequence<?>> either to send it around over TCP/IP, to JSON or to disk has been so frustrating and time consuming, that I actually consider changing to jython/biopython, biojavaX, or to write my own implementation.</div><div dir="ltr"><br clear="none"></div></div><div dir="ltr"><div dir="ltr">Cheers</div><div dir="ltr">Stefan</div><div dir="ltr"><br clear="none"></div><div><br clear="none"></div><div><br clear="none"></div></div><div dir="ltr"><br clear="none"></div> <div><br clear="none"><br clear="none"></div><div style="display:block;"> <div style="font-family:HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif;font-size:16px;"> <div style="font-family:HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif;font-size:16px;"> <div dir="ltr"> <font face="Arial"> Andreas Prlic <<a rel="nofollow" shape="rect" ymailto="mailto:andreas@sdsc.edu" target="_blank" href="mailto:andreas@sdsc.edu">andreas@sdsc.edu</a>> schrieb am 4:32 Donnerstag, 5.Februar 2015:<br clear="none"> </font> </div><div><div> <br clear="none"><br clear="none"> <div><br clear="none"><br clear="none"></div><div><div><div><div><div><div dir="ltr">Hi Stefan,<div><br clear="none"></div><div>just another quick follow up. You took a look at FastaWriterHelper and it was not useful, right? You need to serialize some header information as well, or what was the problem with it?</div><div><br clear="none"></div><div><a rel="nofollow" shape="rect" target="_blank" href="http://www.biojava.org/docs/api/org/biojava/nbio/core/sequence/io/FastaWriterHelper.html">http://www.biojava.org/docs/api/org/biojava/nbio/core/sequence/io/FastaWriterHelper.html</a><br clear="none"></div><div><br clear="none"></div><div>Thanks,</div><div><br clear="none"></div><div>Andreas</div><div><br clear="none"></div></div><div><br clear="none"><div><div>On Wed, Feb 4, 2015 at 7:13 AM, Andreas Prlic <span dir="ltr"><<a rel="nofollow" shape="rect" ymailto="mailto:andreas@sdsc.edu" target="_blank" href="mailto:andreas@sdsc.edu">andreas@sdsc.edu</a>></span> wrote:<br clear="none"><blockquote style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex;"><div dir="ltr">Thanks for pointing this out, Stefan. The problem is that the NucleotideCompound class does not have a zero-args constructor. That means you need to tweak kryo a bit. Kryo can be configured to use an InstantiatorStrategy to handle creating instances of a class. <a rel="nofollow" shape="rect" target="_blank" href="https://github.com/EsotericSoftware/kryo/blob/master/README.md">https://github.com/EsotericSoftware/kryo/blob/master/README.md</a><div><br clear="none"></div><div>Having said that, we need to improve the API and make something like this easier. </div><span><font color="#888888"></font></span><div><br clear="none"></div><div>Andreas</div><div><div><div><br clear="none"><div><br clear="none"></div></div><div><br clear="none"><div>On Wed, Feb 4, 2015 at 2:54 AM, stefan harjes <span dir="ltr"><<a rel="nofollow" shape="rect" ymailto="mailto:stefanharjes@yahoo.de" target="_blank" href="mailto:stefanharjes@yahoo.de">stefanharjes@yahoo.de</a>></span> wrote:<br clear="none"><blockquote style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex;"><div><div style="color:rgb(0,0,0);font-family:HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif;font-size:16px;background-color:rgb(255,255,255);"><div dir="ltr"><span>I finally had some time to try the serialization/deserialization library (Kryo) you mentioned, but I do not seem to get it to work. I can not even save a DNASequence:</span></div><div dir="ltr"><br clear="none"><span></span></div><div dir="ltr"><span>void test() {<br clear="none"> Kryo kryo = new Kryo();<br clear="none"> DNASequence dna=null;<br clear="none"> try {<br clear="none"> dna = new DNASequence("AGCT");<br clear="none"> } catch (CompoundNotFoundException e1) {<br clear="none"> // TODO Auto-generated catch block<br clear="none"> e1.printStackTrace();<br clear="none"> }<br clear="none"> try {<br clear="none"> Output output = new Output(new FileOutputStream("test.ser"));<br clear="none"> kryo.writeObject(output, dna);<br clear="none"> output.close(); <br clear="none"> } catch (FileNotFoundException e) {<br clear="none"> // TODO Auto-generated catch block<br clear="none"> e.printStackTrace();<br clear="none"> }<br clear="none"> try {<br clear="none"> Input input = new Input(new FileInputStream("test.ser"));<br clear="none"> dna = kryo.readObject(input, DNASequence.class);<br clear="none"> input.close();<br clear="none"> } catch (FileNotFoundException e) {<br clear="none"> // TODO Auto-generated catch block<br clear="none"> System.out.println("file not found");<br clear="none"> e.printStackTrace();<br clear="none"> }<br clear="none">}<br clear="none"></span></div><div dir="ltr"><span>I tried several calls of Kryo and also registration, but I can not get it to work.... Any ideas?</span></div><div dir="ltr"><span><br clear="none"></span></div><div dir="ltr"><br clear="none"><span></span></div><div dir="ltr"><span>Cheers</span></div><div dir="ltr"><span>Stefan</span></div><div dir="ltr"><span></span></div> <div><br clear="none"><br clear="none"></div><div style="display:block;"> <div style="font-family:HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif;font-size:16px;"> <div style="font-family:HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif;font-size:16px;"> <div dir="ltr"> <font face="Arial"> Andreas Prlic <<a rel="nofollow" shape="rect" ymailto="mailto:andreas@sdsc.edu" target="_blank" href="mailto:andreas@sdsc.edu">andreas@sdsc.edu</a>> schrieb am 3:47 Samstag, 31.Januar 2015:<br clear="none"> </font> </div><div><div> <br clear="none"><br clear="none"> <div><div><div><div dir="ltr">Hi Stefan,<div><br clear="none"></div><div>for your use case (save and load at server start/stop) I'd recommend the Kryo library. It will store your data as a binary. Should be only two lines of code each to persist and load the data. <a rel="nofollow" shape="rect" target="_blank" href="https://github.com/EsotericSoftware/kryo">https://github.com/EsotericSoftware/kryo</a></div><div><br clear="none"></div><div>You are right, writing is not very well developed, but then there are so many utility libraries in Java that can be used for efficient serialization/deserialization in many ways, once you have an object in memory.</div><div><br clear="none"></div><div>Andreas</div><div><br clear="none"></div><div><br clear="none"></div><div><div><br clear="none"><div>On Fri, Jan 30, 2015 at 3:01 AM, stefan harjes <span dir="ltr"><<a rel="nofollow" shape="rect" ymailto="mailto:stefanharjes@yahoo.de" target="_blank" href="mailto:stefanharjes@yahoo.de">stefanharjes@yahoo.de</a>></span> wrote:<br clear="none"><blockquote style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex;"><div style="color:rgb(0,0,0);font-family:HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif;font-size:16px;background-color:rgb(255,255,255);">Hi biojava-l<div><br clear="none"><br clear="none"></div><div style="display:block;"><div style="font-family:HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif;font-size:16px;"><div style="font-family:HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif;font-size:16px;"><div><div><div><div><div style="color:rgb(0,0,0);font-family:HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif;font-size:16px;background-color:rgb(255,255,255);"><div style="display:block;"><div style="font-family:HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif;font-size:16px;"><div style="font-family:HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif;font-size:16px;"><div><div><div><div style="color:rgb(0,0,0);font-family:HelveticaNeue, 'Helvetica Neue', Helvetica, Arial, 'Lucida Grande', sans-serif;font-size:16px;background-color:rgb(255,255,255);"><div><br clear="none"></div><div dir="ltr">I have a huge number of small sequences in an Array (ListArray<Sequence<?>>) which for server start and stop I would like to store on disk. Unfortunately Sequence is not serilizable, so I searched and found that GenbankWriterHelper.writeSequences(OutputStream os, Collection<Sequence<?>> seqs) should be able to do the job. <br clear="none"></div><div dir="ltr"><div>However when looking at GenbankReaderHelper, there are no methods which correspond to the above writer method. Am I on the wrong track completely? <br clear="none"></div><div><br clear="none"></div><div dir="ltr">When looking at the writer/reader helpers, I think I remember reading that they are rudimentary and save only the sequence (fasta)? I would expect in such an advanced verision of biojava (4.0 is being prepared?) that there must be a standard way to serialize rich sequences/arrays of them in order to send them around on streams/Json etc?<br clear="none"></div><div><br clear="none"></div><div>Any help would be appreciated</div></div><div dir="ltr"><br clear="none"></div><div dir="ltr">Cheers</div><span><font color="#888888"></font></span><div dir="ltr">Stefan</div><br clear="none"></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></blockquote></div><div><br clear="none"></div>
</div></div></div></div></div><br clear="none"><br clear="none"></div> </div></div></div> </div> </div> </div></div></blockquote></div><br clear="none"><div><br clear="none"></div>
</div></div></div></div>
</blockquote></div></div><br clear="none"><br clear="all"><div><br clear="none"></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></div></blockquote></div><div><div dir="ltr"><div><div><div><div><div><br clear="none"></div></div></div></div></div></div></div>
</div></div></div></div></div></div><br clear="none"><br clear="none"></div> </div></div></div> </div> </div> </div></div></div><br clear="none">_______________________________________________<br clear="none">
biojava-dev mailing list<br clear="none">
<a rel="nofollow" shape="rect" ymailto="mailto:biojava-dev@mailman.open-bio.org" target="_blank" href="mailto:biojava-dev@mailman.open-bio.org">biojava-dev@mailman.open-bio.org</a><br clear="none">
<a rel="nofollow" shape="rect" target="_blank" href="http://mailman.open-bio.org/mailman/listinfo/biojava-dev">http://mailman.open-bio.org/mailman/listinfo/biojava-dev</a><br clear="none"></blockquote></div><br clear="none"></div></div></div><br clear="none"><br clear="none"></div> </div></div></div> </div> </div> </div></div></blockquote></div></div><br clear="none"></div></div></div><br clear="none"><br clear="none"></div> </div></div></div> </div> </div> </div></div></blockquote></div><br clear="none"></div>
</div></div></blockquote></div><br clear="none"><br clear="all"><div><br clear="none"></div>
</div></div></div></div></div></div></div><br clear="none"><br clear="none"></div></div> </div> </div> </div> </div></div></div></div></body></html>