[Biojava-dev] concatenated

stefan harjes stefanharjes at yahoo.de
Fri Feb 13 06:53:55 UTC 2015


Hi Peter
you are right of course we were not talking of concatenated sequence(Strings), but of sequence records which are usually concatenated for large libraries. I was also used from BioPython to parse such files.
The current discussion might be caused by a slightly ambiguous naming scheme. The sequence records in biojava are derived from the class AbstractSequence and are thus called DNASequence etc. Thus, when we were speaking of sequences, we were actually speaking of sequence records. 
The different expectations of Paolo and me might also have been cause by the fact that it is hard to distinguish between actual sequence readers/proxys (which would read exactly one sequence) and sequence record readers/parsers (which would read as many as there are). In an ideal world this could be resolved into:

-Sequences which only hold the actual sequence
    --SequenceFactory which contains the sequence readers/writers. -SequenceRecord which holds the sequence and all features
    --SequenceRecordFactory contains the sequence record readers/writers.

such an organization would certainly improve readability and understandability of the library. Unfortunately this probably represents an unspecified amount of coding...

CheersStefan

 



 which 'in an ideal world' and maybe even in the biojava future could be resolved.


I would be very surprised if BioJava's earlier contributors would
*concatenate* multiple separate sequences in a file into one
long sequence.

Rather, much like the BioPerl and Biopython SeqIO libraries, surely
the design is to allow multiple separate sequence records to be
parsed from most sequence file formats? e.g. FASTA, FASTQ,
GenBank, etc can all hold zero or more records.

For Biopython at least, we have an iterator approach for all our
SeqIO sequence parsers, with a helper function for when you
expect the file to contain one and only one record (in which case
the for loop style used with an iterator is cumbersome).

Perhaps we are understanding different meanings from "concatenate"?

Regards,

Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biojava-dev/attachments/20150213/ef505d9c/attachment.html>


More information about the biojava-dev mailing list