[Biojava-dev] Parser backwards compatibility

Sun Apr 15 15:45:08 UTC 2012

> I think it is more important to handle large files in the way
> that we support indexing of fasta files.

So essentially what you are looking for is something like this:
http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc58

Should I add this to my proposal? I had figured that writing
and organizing parsers was my main goal, but if this is
important enough I can definitely add it.

David
On Apr 14, 2012 10:48 AM, "Scooter Willis" <HWillis at scripps.edu> wrote:

> We still have the reality that per data set you typically have one data
> format. Since you are going after a specific return type for the actual
> data I don't think you gain very much by abstracting out file parsing. I
> think it is more important to handle large files in the way that we
> support indexing of fasta files. First pass find all the index positions
> in the file and return the appropriate sequence object. At some point in
> the future if you need the sequence the underlying storage mechanism knows
> how to retrieve the data from disk quickly. Same concept for databases or
> remote web services.
>
> On 4/14/12 10:35 AM, "P. Troshin" <to.petr at gmail.com> wrote:
>
> >> So what you're looking for is something like this?
> >> FastaParser fasta = ParserFactory.fasta("example.fasta");
> >> FastqParser fastq = ParserFactory.fastq("example.fastq");
> >
> >Yes, only that I expect to construct a parser from an InputStream as well.
> >I agree with Hannes that having a factory you could guess the input
> >format and instantiate an appropriate parser. However, I do not see
> >this as a particularly important feature because in the real life you
> >usually know which format you work with.
> >
> >Regards,
> >Peter
> >
> >
> >On 14 April 2012 14:50, David Felty <davfelty at gmail.com> wrote:
> >> Michael Heuer wrote:
> >>> Open source projects should projects should provide room for
> >>> both evolutionary and revolutionary changes
> >>
> >> Thanks for all the info, very useful!
> >>
> >>
> >> P. Troshin wrote:
> >>> I think you just need to make a common entry point for them.
> >>> E.g a factory class which would contain functions to
> >>> instantiate various parsers.
> >>
> >> So what you're looking for is something like this?
> >> FastaParser fasta = ParserFactory.fasta("example.fasta");
> >> FastqParser fastq = ParserFactory.fastq("example.fastq");
> >>
> >> Scooter Willis wrote:
> >>> Can you give some examples of what you are trying to do for
> >>> the common set of interfaces?
> >>
> >> I gave this example in my proposal at
> >>
> >>
> http://www.google-melange.com/gsoc/proposal/review/google/gsoc2012/dfelt/
> >>2001
> >>
> >> for (BasicSequence seq : SeqIO.parse(inStream, SeqFormat.FASTA) {
> >>     System.out.println(seq.getSequenceAsString());
> >> }
> >>
> >> But I think Troshin's idea would be easier to implement, given
> >> the current BioJava parsers.
> >>
> >> On Apr 13, 2012 1:31 PM, "P. Troshin" <to.petr at gmail.com> wrote:
> >>>
> >>> Hi David,
> >>>
> >>> > In order to fit BioJava's parsers into a shared API, I would like to
> >>> > wrap them under a common set of interfaces.
> >>>
> >>> I think you just need to make a common entry point for them. E.g. a
> >>> factory class which would contain functions to instantiate various
> >>> parsers.
> >>> You only need a common interface for the same parsers, e.g. Fasta
> >>> parsers. However, I'd be inclined to converge all Fasta parsers in
> >>> BioJava to one parser. So I am not sure you'd need a common interface
> >>> in the end.
> >>>
> >>> >However, I foresee that
> >>> > some of the parsers will resist being wrapped, and will need to
> >>>either
> >>> > be modified or rewritten.
> >>>
> >>> You'll need to choose the best parser and implement features that a
> >>> lacking from it. Other parsers then can be retired.
> >>>
> >>>
> >>> >However, this would mean that two different
> >>> > copies of the same parsers would exist in BioJava, which I think is
> >>> > kind of ugly.
> >>>
> >>> Yes, that would be scary for the languages like Perl or Python, but
> >>> Java is compiled language, so you'll see most of the problems as
> >>> compilation errors. You will also need to write unit tests for
> >>> existing parsers and then for your new parser to make sure that
> >>> rewrite were successful.
> >>>
> >>> >However, this would mean that two different
> >>> > copies of the same parsers would exist in BioJava, which I think is
> >>> > kind of ugly.
> >>>
> >>> The whole idea of this project is to get rid of this ugliness, and
> >>> provide a streamline API for the users as well as the powerful
> >>> parsers.
> >>>
> >>> Hope that helps.
> >>> Regards,
> >>> Peter
> >>>
> >>>
> >>> On 13 April 2012 14:47, David Felty <davfelty at gmail.com> wrote:
> >>> > In order to fit BioJava's parsers into a shared API, I would like to
> >>> > wrap them under a common set of interfaces. However, I foresee that
> >>> > some of the parsers will resist being wrapped, and will need to
> >>>either
> >>> > be modified or rewritten. So my question is, should I keep the
> >>> > original versions these problematic parsers around for backwards
> >>> > compatibility, or can I freely modify them to fit into the new API?
> >>> > I'm afraid that the latter would break existing code, so I'm more
> >>> > inclined to do the former. However, this would mean that two
> >>>different
> >>> > copies of the same parsers would exist in BioJava, which I think is
> >>> > kind of ugly. Any thoughts?
> >>> >
> >>> > Thanks,
> >>> > David
> >>> > _______________________________________________
> >>> > biojava-dev mailing list
> >>> > biojava-dev at lists.open-bio.org
> >>> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >
> >_______________________________________________
> >biojava-dev mailing list
> >biojava-dev at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
>