[Biojava-l] New idea for alignment parsing and Re: Parser for MrBayes output

Pola Kyzioł pola.kyziol at gmail.com
Thu Nov 6 22:38:31 UTC 2014


Thank you for your answers. I've looked through sources of forester and
found some parsers (org/forester/io/parsers/
nexus). On first sight it could be what you talked about but I'm not sure.

Cheers,
Pola

2014-11-06 13:49 GMT+01:00 Ben Stöver <benstoever at uni-muenster.de>:

>
>
> Spencer Bliven schrieb am 2014-11-06:
> > Ben,
>
> > This sounds like a great idea and a really useful addition to
> > biojava! I
> > would lean towards only parsing the consensus tree, as the other
> > formats
> > are pretty specific use cases. We're sure forester doesn't provide
> > Nexus
> > parsing, right? The documentation isn't particularly complete, but
> > it's
> > already a phylo dependency so we should avoid duplicating any
> > features.
>
> No, I'm personally not 100 % sure if any Nexus features are implemented in
> forester, but I thought they are not, because otherwise there would have
> been
> no Nexus parsing system in BioJava 1.x?
>
>
> > As to your second suggestion, it sounds very similar to how
> > FastaReader
> > currently works, with the user providing a SequenceCreator which
> > instantiates whatever Sequence implementation you want to use.
> > Mutable
> > sequences can lead to a host of additional problems, which is why the
> > sequences are currently generated atomically. Or am I
> > misunderstanding your
> > suggestion?
>
> I just looked at the code
> (
> https://github.com/biojava/biojava/blob/master/biojava3-core/src/main/java/org/biojava3/core/sequence/io/FastaReader.java
> ) and SequenceCreator does not do exactly what I meant, since in the
> process()
> method of FastaReader, the whole sequence is first loaded into a
> StringBuilder
> and afterwards passed to sequenceCreator, which means there is no
> compression
> during loading. So SequenceCreator does a part of what I was thinking of,
> but
> it would not work for very large sequences. (Although I don't find it now,
> I
> think I read a similar statement somewhere in the JavaDocs of the
> compresses
> Sequence implementation.)
>
> The main benefits I still see for the idea, would first be the abstract
> strategy pattern for alignment parsers which would allow to write code
> independent of the used format (which is not possible e.g. with the current
> FASTA reader) and second editable sequences would of course be usable in
> use
> cases you cannot really solve with the current sequence model (e.g. using
> it
> as the data backend for an alignment editor or GUI components I have in
> LibrAlign).
>
> I'm not sure which problems you mean which would arise from having mutable
> sequences (remember: the idea was not to replace current implementations of
> the Sequence interface, but to add additional mutable versions). Mayby you
> could give same examples? (Are thinking about the need for change listers
> or
> similar things?)
>
> Anyway it was only an idea for discussion, I'm really not saying that we
> definitely need to go in that direction. (For my own projects I already
> have a
> mutable sequence model with bridges to the current BioJava model, so I
> would
> be fine there.) Maybe there are really problems comming with this idea I
> currently do not see? In that case we could of course also think about just
> adding a interface for sequence parsers, that allows to use them in an
> abstract strategy pattern. (That would than really be a slight API change,
> if
> the existing readers and writers would implement such an interface, but it
> might be possible, when there is anyway a version 4 comming?)
>
> Best
> Ben
>
>
> > It would be fantastic to have some additional development of multiple
> > alignments and the phylo package! Thanks for the offer to contribute!
>
> > -Spencer
>
> > On Thu, Nov 6, 2014 at 12:19 PM, Jose Manuel Duarte
> > <jose.duarte at psi.ch>
> > wrote:
>
> > > Hi Ben
>
> > > Thanks a lot for all the insights. I am really not the most
> > > appropriate
> > > person to comment on all the biojava phylogeny and sequence related
> > > things
> > > but anyway below are some of my opinions.
>
>
> > > On 05/11/14 17:22, Ben Stöver wrote:
>
>
>
> > >> The more interesting/urgent thing though might be parsing the
> > >> consensus
> > >> tree
> > >> which is in Nexus format (or writing the input files for MrBayes).
> > >> Although
> > >> the Nexus format is not really state of the art anymore and
> > >> replacements
> > >> like
> > >> e.g. NeXML (http://nexml.org/ )  - which overcome its limitations
> > >> -
> > >> should be
> > >> prefered if you implement a new software, the Nexus format is
> > >> still widely
> > >> used and supporting in BioJava 3 (or 4) would surely be a good
> > >> idea.
> > >> There was
> > >> a extensible Nexus parser in BioJava 1.x
> > >> (http://www.biojava.org/docs/api1.9.1/org/biojavax/bio/
> > >> phylo/io/nexus/package-summary.html
> > >> ) which could be ported to BioJava 3 (4). (This has never been
> > >>   done until
> > >> now,
> > >> hasen't it?)
>
>
> > > If I understand it properly they were not ported yet to 3 because
> > > of lack
> > > of time, so I think the porting of the nexus stuff would be a great
> > > thing.
> > > +1 to that.
>
>
>
> > >> Therefore I would offer to implement such functionality for
> > >> BioJava, but
> > >> before making a pull request or anything, I wanted to ask for
> > >> opinion of
> > >> the
> > >> cummunity on that idea and also if I might have missed concepts in
> > >> BioJava
> > >> that would currently already allow to do something similar.
>
>
> > > To me the whole idea sounds great. Especially if it can be made
> > > compatible
> > > with the existing Biojava interfaces. If I understand what you
> > > propose, you
> > > would only introduce a new way of parsing things which could even
> > > live
> > > alongside the current parsers. It could even go to its own package
> > > (sequence.nio ?). For me this is a +1 too.
>
> > > Cheers
>
> > > Jose
>
> > > _______________________________________________
> > > Biojava-l mailing list  -  Biojava-l at mailman.open-bio.org
> > > http://mailman.open-bio.org/mailman/listinfo/biojava-l
>
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biojava-l
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biojava-l/attachments/20141106/bc8aca1f/attachment.html>


More information about the Biojava-l mailing list