[Biojava-l] GSCO 2012: New File Parsers for BioJava

Tue Mar 27 18:00:23 UTC 2012

Hi Nicolae,

> - I took a look at your sources and saw you already have parsers for a lot of files like: FASTA, FASTQ, PDB, mmcif etc. What are the priorities for the new parsers, which is needed most ?

We are keeping the answer to this question intentionally open and want
students to pick a topic they are interested in. For a list of
requests that we have received from users, please take a look at:
http://biojava.org/wiki/BioJava3_Feature_Requests, however we welcome
other suggestions as well.

> - Should we choose only one parser to work on for this project, or the expectations are to implement more than one ?

For a start focus on one parser and make sure it integrates well with
the rest of biojava. Only propose more than one if you think you can
easily do that given the amount of time. We are looking for realistic
student proposals, so make sure you come up with a good and realistic
plan. We are happy to discuss proposals before they are being
submitted to google and will give feedback about how to improve them.

> Questions  about the "Coding exercise"

Peter, do you want to answer those?

Thanks,

Andreas

>
> - About the "ambiguous characters", lets say we have ambiguous DNA. For these two sequences: "ACTATATCGG" and "ATGKMCGW" we should have in one FASTA output file the sequence  "ACTATATCGG" and in another one "ATGKMCGW" ?
>
> - What do you mean by large, “be capable of reading large files”, because afterwards under “Submission”  it says “the test data file named data.fasta up to 10Kb in
> size” ? Should I understand that 10Kb is the limit for a “large file” ?
>
> Best regards,
> Nicolae
>
> [1] http://aichallenge.org
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l