[Biojava-l] GSCO 2012: New File Parsers for BioJava

P. Troshin to.petr at gmail.com
Tue Mar 27 20:50:22 UTC 2012


Hi Nick,

I agree with Andreas (thank you for coming in!), just a few additions below:

> - 1 year and a half experience with Java; it became my first choice in
coding; currently I do all my tasks and homework in Java, also developing a
bot for aichallenge [1] in Java as a university project. And a little
personal project I'm working at, a memory test game, also written in Java.
> - 5 years of C/C++
> - web: HTML, PHP, CSS, MySQL - made a module for my school's website

Great, sound Java knowledge is something that would help you a lot on this
project.

>
> Some thoughts and questions about the project
>
> - I took a look at your sources and saw you already have parsers for a
lot of files like: FASTA, FASTQ, PDB, mmcif etc. What are the priorities
for the new parsers, which is needed most ?

You are right there are many parsers in BioJava, too many actually, we only
need one parser for one file format. However, currently this is not the
case, there are 2 or 3 FASTA parsers for example. They are all subtly
different, so the task would be to unify these parsers so one parser could
be used for in all the cases.

> - Should we choose only one parser to work on for this project, or the
expectations are to implement more than one ?
It depends on the parser and on your own abilities. However, if you can
only make one FASTA parser in 3 months, than your application is unlikely
to be competitive.

> Questions  about the "Coding exercise"
>
> - About the "ambiguous characters", lets say we have ambiguous DNA. For
these two sequences: "ACTATATCGG" and "ATGKMCGW" we should have in one
FASTA output file the sequence  "ACTATATCGG" and in another one "ATGKMCGW" ?

Correct

>
> - What do you mean by large, “be capable of reading large files”, because
afterwards under “Submission”  it says “the test data file named data.fasta
up to 10Kb in
> size” ? Should I understand that 10Kb is the limit for a “large file” ?

For this exercise assume that the large file is the one that does not fit
into the computers RAM. With Java programme you can substitute computer RAM
with the amount of memory available for JVM. So let's say that your parser
should be able to work with 512Mb file with the JVM settings -Xmx256M.
And yes, you do not have to email this file to me.

I hope that helps.

Good luck with your application.

Regards,
Peter




More information about the Biojava-l mailing list