<p>Dear all,<br>
Thank you Andreas for your review I think I've got the sense. Thank you also to Jacek for your proposal. <br>
If I can add my opinion I must say that both the api work and now benefit of the new implemented parser and so there is no urgent need of any change.<br>
The plot might be that they are not very very intuitive and might cause delays and discourage the add of new features by other developers.</p>
<p>For the records I will summarize here how the proxy system works.<br>
GenbankProxySequenceReader realizes DatabaseReferenceInterface and FeatureKeywordInterface that declare getDatabaseReferences() and getKeyWords() methods. These interfaces are introspected by AbstractSequence, in our case at time of construction with use of proxyLoader (AbstractSequence(SequenceReader<C> proxyLoader, CompoundSet<C> compoundSet)) and are used at this time to populate instantiated sequence object with database and keyword information. </p>
<p>In other words the original idea was to declare a new interface for every category of property that must populate a sequence object and this logic will be in charge of the AbstractSequence construction with ProxyReader use. A developer must add the loading code here.</p>
<p>Said that all the things work and the current code is high-level, if we would catch Andreas' cleaning proposal, I think that the only effort that make sense to profuse will involve a new, simpler and plain re-design of the data IO api more then providing new interfaces to the current already overcrowded system. I know, this is the hard and long way but in my opinion is the only valid improvement we could really do at this point. </p>
<p>I have some ideas and some experience on this. I am imagining an api that is easy to extend: one developer that wants to add a new parser, must just write the parser and plug it into the system to work. <br>
I would delegate to the sequence class the mere role of a data structure (the most important in bioinformatics along with alignment indeed). The only methods allowed would be those to manipulate the sequence representation.</p>
<p>But anyway I don't really know if we want to enter such long and difficult road and actually it cannot involve just two developers. It's a feature for biojava5, perhaps ;-) </p>
<p>Greetings !<br>
Paolo <br></p>
<p>Il giorno 03/dic/2014 13:13, "Jacek Grzebyta" <<a href="mailto:grzebyta.dev@gmail.com">grzebyta.dev@gmail.com</a>> ha scritto:<br>
><br>
> Hi,<br>
><br>
> If it than looks like that I suggest to change the proxy Interface. It could have a getter for data source instance from org.biojava3.core.sequence.DataSource. Than create an abstract Proxy instance which will map a datasource into relevant URI. But we need to take into consideration that each (or more of them) would require unique API anyway to proxy a data. Long time ago I tried to do it but gave up after I discovered RDF and semantic web. anyway I will do changes and submit to my branch repository.<br>
><br>
><br>
> Regards,<br>
><br>
> Jacek<br>
><br>
><br>
><br>
> Hi Paolo,<br>
><br>
> I don't remember the full history of this, but after having reviewed the<br>
> code I think the story is like this:<br>
><br>
> The "proxy" means that an entry can be fetched from an external DB based<br>
> on a reference ID.<br>
><br>
> Then there is another requirement to read a single record from a file<br>
> containing many entries. (hence the differences between InputStream and<br>
> Bufferedreader), which might explain the different approaches.s<br>
><br>
> Having said that, I do think the API is inconsistent and could benefit from<br>
> some cleanup and also we need better documentation for this. Any pull<br>
> requests are welcome!<br>
><br>
> Andreas<br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
><br>
> On Sat, Nov 22, 2014 at 12:10 PM, Paolo Pavan <<a href="mailto:paolo.pavan@gmail.com">paolo.pavan@gmail.com</a>> wrote:<br>
><br>
> > Dear all,<br>
> > Me and Jacek Grzebyta have added support for reading features, qualifiers<br>
> > and nested locations with "split" indications in genbank files and we hope<br>
> > this feature will be included in the next 4.0 release.<br>
> ><br>
> > Anyway we face the existing of two ways to parse a genbank file: via<br>
> > GenbankProxySequenceReader and via GenbankReader. Both use the same<br>
> > underlying GenbankSequenceParser now updated, but in different ways.<br>
> ><br>
> > Is there a reason that escapes to me of why such a dichotomy design or is<br>
> > just the result of the efforts of two independent working groups? This<br>
> > ?proxy? naming suggests me it wants to add something more to the standard<br>
> > GenbankReader, isn?t it? There is an advised one? One difference is that<br>
> > one is using an InputStream, the second a BufferedReader.<br>
> ><br>
> > Can someone of the original authors add any note on that?<br>
> ><br>
> > Thank you very much,<br>
> > Paolo<br>
> ><br>
> > _______________________________________________<br>
> > biojava-dev mailing list<br>
> > <a href="mailto:biojava-dev@mailman.open-bio.org">biojava-dev@mailman.open-bio.org</a><br>
> > <a href="http://mailman.open-bio.org/mailman/listinfo/biojava-dev">http://mailman.open-bio.org/mailman/listinfo/biojava-dev</a><br>
> ><br>
> -------------- next part --------------<br>
> An HTML attachment was scrubbed...<br>
> URL: <<a href="http://mailman.open-bio.org/pipermail/biojava-dev/attachments/20141124/7b9d0f9a/attachment-0001.html">http://mailman.open-bio.org/pipermail/biojava-dev/attachments/20141124/7b9d0f9a/attachment-0001.html</a>><br>
><br>
> ------------------------------<br>
><br>
> _______________________________________________<br>
> biojava-dev mailing list<br>
> <a href="mailto:biojava-dev@mailman.open-bio.org">biojava-dev@mailman.open-bio.org</a><br>
> <a href="http://mailman.open-bio.org/mailman/listinfo/biojava-dev">http://mailman.open-bio.org/mailman/listinfo/biojava-dev</a><br>
><br>
> End of biojava-dev Digest, Vol 140, Issue 4<br>
> *******************************************<br>
><br>
> _______________________________________________<br>
> biojava-dev mailing list<br>
> <a href="mailto:biojava-dev@mailman.open-bio.org">biojava-dev@mailman.open-bio.org</a><br>
> <a href="http://mailman.open-bio.org/mailman/listinfo/biojava-dev">http://mailman.open-bio.org/mailman/listinfo/biojava-dev</a></p>
<div class="gmail_quote">Il giorno 25/nov/2014 05:15, "Andreas Prlic" <<a href="mailto:andreas@sdsc.edu" target="_blank">andreas@sdsc.edu</a>> ha scritto:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi Paolo,<div><br></div><div>I don't remember the full history of this, but after having reviewed the code I think the story is like this:</div><div><br></div><div> The "proxy" means that an entry can be fetched from an external DB based on a reference ID. </div><div><br></div><div>Then there is another requirement to read a single record from a file containing many entries. (hence the differences between InputStream and Bufferedreader), which might explain the different approaches.s</div><div><br></div><div>Having said that, I do think the API is inconsistent and could benefit from some cleanup and also we need better documentation for this. Any pull requests are welcome!</div><div><br></div><div>Andreas</div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div><br></div><div class="gmail_extra"><br><div class="gmail_quote">On Sat, Nov 22, 2014 at 12:10 PM, Paolo Pavan <span dir="ltr"><<a href="mailto:paolo.pavan@gmail.com" target="_blank">paolo.pavan@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div><div><div><div>Dear all,<br></div>Me and Jacek Grzebyta have added support for reading features, qualifiers and nested locations with "split" indications in genbank files and we hope this feature will be included in the next 4.0 release.<br></div><br>Anyway we face the<span lang="EN-US"> existing of two
ways to parse a genbank file: via GenbankProxySequenceReader and via GenbankReader. </span><span lang="EN-US">Both use the same underlying GenbankSequenceParser now updated, but in different ways.<br><br>Is there a reason that
escapes to me of why such a dichotomy design or is just the result of the efforts of two
independent working groups? This “proxy” naming suggests me it wants to add
something more to the standard GenbankReader, isn’t it? There is an advised one?
One difference is that one is using an InputStream, the second a BufferedReader.<br><br></span></div><span lang="EN-US">Can someone of the original authors add any note on that?<br><br></span></div><span lang="EN-US">Thank you very much,<br>Paolo<br></span> </div>
<br>_______________________________________________<br>
biojava-dev mailing list<br>
<a href="mailto:biojava-dev@mailman.open-bio.org" target="_blank">biojava-dev@mailman.open-bio.org</a><br>
<a href="http://mailman.open-bio.org/mailman/listinfo/biojava-dev" target="_blank">http://mailman.open-bio.org/mailman/listinfo/biojava-dev</a><br></blockquote></div><br><br clear="all"><div><br></div>
</div></div>
</blockquote></div>