[Biojava-dev] Biojava.util package?
Andreas Prlic
andreas at sdsc.edu
Thu Mar 29 14:39:39 UTC 2012
Hi David,
so far it still feels like a wrapper for what is already there. Try to
take it to the next level. Why does the user still need to provide the
type of file, can't this be auto-detected? What is the behaviour for
non-fasta files, what can be supported and where are the limits, etc.
Andreas
On Thu, Mar 29, 2012 at 6:55 AM, David Felty <davfelty at gmail.com> wrote:
> I've actually been working on something like this for my GSoC proposal,
> here's what I came up with:
>
> public class SeqIO {
> public static final int FASTA = 0;
> public static final int FASTQ = 1;
> public static final Class<DNASequence> DNA = DNASequence.class;
> public static final Class<ProteinSequence> PROTEIN =
> ProteinSequence.class;
>
> public static <S extends Sequence> Iterable<S> parse(InputStream is,
> int fileFormat, Class<S> seqType) throws Exception {
> switch (fileFormat) {
> case FASTA:
> if (seqType == DNA) {
> return (Iterable<S>)
> FastaReaderHelper.readFastaDNASequence(is);
> } else if (seqType == PROTEIN) {
> // etc...
> }
> break;
> case FASTQ:
> // etc...
> }
> }
> }
>
> It would be used like so:
>
> InputStream is = ...
> Iterable<DNASequence> seqs = SeqIO.parse(is, SeqIO.FASTA, SeqIO.DNA);
> for (DNASequence s : seqs) {
> // do something
> }
>
> Obviously it's not the prettiest and a lot could be changed, but that's my
> initial design. I tried to base it off BioPython's SeqIO, but static typing
> and the variety of Sequence types forced me to put in some nasty generics.
> Any tips would be appreciated!
>
> David
>
> On Thu, Mar 29, 2012 at 4:27 AM, Hannes Brandstätter-Müller <
> biojava at hannes.oib.com> wrote:
>
>> Yes, something like a simplifying and unifying wrapper would be what I
>> am thinking of.
>>
>> Hannes
>>
>> On Thu, Mar 29, 2012 at 05:55, Andreas Prlic <andreas at sdsc.edu> wrote:
>> > Hi Hannes,
>> >
>> > I guess this is pretty similar to:
>> >
>> > http://biojava.org/wiki/BioJava:CookBook:Core:FastaReadWrite
>> >
>> > we have also been using "proxy" objects to fetch sequence data on the fly
>> >
>> > http://biojava.org/wiki/BioJava:CookBook:Core:Sequences
>> >
>> > As such I think we should discuss this a bit more. If we can find a
>> > common api that is simple and works with both local files as well as
>> > remote proxy objects, that would be nice. There should be no need to
>> > change much of the existing code, but perhaps there could be a
>> > simplified wrapper for what is already there.
>> >
>> > Andreas
>> >
>> > On Wed, Mar 28, 2012 at 12:04 PM, Hannes Brandstätter-Müller
>> > <biojava at hannes.oib.com> wrote:
>> >> Hi,
>> >>
>> >> I browsed around in the sister projects Biopython and Bioperl a bit,
>> >> and noticed that many of the user interaction with the code goes
>> >> through classes like SeqIO, SearchIO, AlignIO...
>> >>
>> >> So that got me thinking: how about we create similar "Interface"
>> >> classes in Biojava?
>> >>
>> >> PROS:
>> >>
>> >> - easy change for programmers who switch languages
>> >> - easy base interface that can be used in cookbook examples
>> >> - makes code more readable if designed properly
>> >> - easy access to features that are spread over the whole codebase but
>> >> are connected anyway, like all file parsers
>> >>
>> >> CONS:
>> >>
>> >> - another thing to maintain
>> >> - creates possible cross-dependencies (but if you don't want that,
>> >> just use the existing classes directly)
>> >>
>> >>
>> >> What are your thoughts?
>> >>
>> >> python from http://biopython.org/wiki/SeqIO:
>> >>
>> >> from Bio import SeqIO
>> >> handle = open("example.fasta", "rU")
>> >> for record in SeqIO.parse(handle, "fasta") :
>> >> print record.id
>> >> handle.close()
>> >>
>> >> possible equivalent in biojava (support for streaming API, Iterators,
>> etc?):
>> >>
>> >> import org.biojava3.util.SeqIO;
>> >>
>> >> File file = new File("example.fasta");
>> >> SeqIO seqIO = new SeqIO(file, SeqIO.FASTA);
>> >> while (seqIO.hasNext()) {
>> >> System.out.println(seqIO.next());
>> >> }
>> >> file.close();
>> >>
>> >> Hannes
>> >> _______________________________________________
>> >> biojava-dev mailing list
>> >> biojava-dev at lists.open-bio.org
>> >> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>> >
>> >
>> >
>> > --
>> > -----------------------------------------------------------------------
>> > Dr. Andreas Prlic
>> > Senior Scientist, RCSB PDB Protein Data Bank
>> > University of California, San Diego
>> > (+1) 858.246.0526
>> > -----------------------------------------------------------------------
>>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>
>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
More information about the biojava-dev
mailing list