[Biojava-dev] The future of BioJava

Andy Yates ayates at ebi.ac.uk
Thu Sep 20 08:55:13 UTC 2007


Hi,

I would say yes to this as well. It is very important to know what green 
people are attempting to do with BioJava rather than us assuming that we 
know :). There are parts in BioJava where the flexibility of the code is 
not sufficient for other people who want to use the code base & in other 
areas too flexible.

I've talked to quite a few people over the years who have used biojava 
for simple & complex applications and they all seem to come back round 
to a few key problems:

* Sequence & SymbolLists are strange and why can't I use a String - All 
of this makes a lot more sense if you know about the flyweight pattern; 
if not it just seems very strange.

* I have a format that's EMBL like. Can I parse it using Biojava?

* How do I read in a FASTA file?

* How can I get X from this chromatogram & can I parse my specific trace 
format into a BioJava object?

As Andreas said it's the occurrence of the category A problems that are 
the most worrying. In terms of sequences I think I can see why people 
have a problem with it.

Just if we take this as an example:

I have my DNA sequence in a String I can substring it, perform a regular 
expression over it, replace sections, pad it out, format it & so on. If 
I have a Sequence object I can perform most of these actions but the 
interface to them seems unintuitive. Things like calling seqString() to 
get the String back out from a sequence rather than calling toString(). 
Also lets say I want to use a sequence as a key in a hash map or ask if 
two sequences are equal (using the old sequence objects) ... at the 
moment I'd have to convert Sequence -> String to perform the comparison 
(and that doesn't include checking a Sequence for alphabet equality).

I know this sounds like nit-picking & for people who have used biojava 
extensively a lot of this makes sense. For someone new to the project it 
seems like we've done something just for the sake of it and we need to 
get rid of that feeling which I'm sure will happen if we address the 
category A problem. The rest will fall into place :)

Andy

Richard Holland wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> I totally agree.
> 
> Can you post a short summary of this to the Wiki page?
> 
> Not all aspects of BioJava are documented, leading people either to give
> up, consult the JavaDocs online, or post a message to biojava-l or
> biojava-dev.
> 
> Is it possible to get similar stats to the ones you have calculated for
> the JavaDoc pages on our website?
> 
> Also, is it possible to build some kind of index over the mailing list
> archives to pull out the most frequently used terms?
> 
> cheers,
> Richard
> 
> Andreas Prlic wrote:
>> Hi,
>>
>> A question related to the discussion of how to design a future BioJava
>> is to have a look
>> at which parts of BioJava are being actively used and how to improve these.
>>
>> So what are the most frequently used bits of BioJava? One way to look at
>> this is to go to the
>> web-stats and see how many hits we have got on our documentation web pages.
>>
>> In an ideal world BioJava would be so simple to use, that nobody needs
>> to read any docu.
>> Unfortunately we are far away from this, so actually looking at these
>> stats gives an impression
>> on
>>
>> * topics / functionality which are of particular interest to the community
>> * topics / functionality which might not be straightforward to use,
>> therefore there are many hits on these pages.
>>
>> A look at the webstats from the last couple of months gives these top 10
>> Cookbook pages that
>> have been accessed frequently. This list is ordered by nr. of  pageviews
>>
>> 1. /wiki/BioJava:Cookbook:Alphabets
>> 2. /wiki/BioJava:CookBook:Blast:Parser
>> 3. /wiki/BioJava:Cookbook:SeqIO:ReadFasta
>> 4. /wiki/BioJava:Cookbook:SeqIO:ReadGES
>> 5. /wiki/BioJava:CookBook:DP:PairWise2
>> 6. /wiki/BioJava:CookBook:PDB:read
>> 7. /wiki/BioJava:Cookbook:Sequence
>> 8. /wiki/BioJava:Cookbook:SeqIO:WriteInFasta
>> 9. /wiki/BioJava:CookBook:Interfaces:ViewInGUI
>> 10. /wiki/BioJava:CookBook:Fasta:Parse
>>
>> I would group these pages into 2 groups.
>> A) How to work with core concepts of BioJava
>> B) How to use a functionality of BioJava to achieve a certain goal
>>
>> The "conceptual" pages (A) I would identify as
>> * How to get an Alphabet
>> * How to make a Sequence Object from a String or make a Sequence Object
>> back into a String
>>
>> The "functionality"  pages (B) I would summarize as
>> * How to parse a Blast output
>> * How to read sequences from a Fasta file
>> * How to read a GenBank, SwissProt or EMBL file
>> * How to generate a global or local alignment with the Needleman-Wunsch-
>> or the Smith-Waterman-algorithm
>> * How to read a protein structure - PDB file
>> * How to export a sequence to fasta
>> * How to view a sequence in a gui
>> * How to parse a Fasta database search output file
>>
>>
>> As a conclusion I would suggest that BioJava should have the goal to
>> provide easy access to the
>> core "functionalities" (group B).  I believe that we should try to keep
>> the "concepts" that are being used to
>> achieve these functionalities as simple as possible. In this sense, I
>> feel that we have too many hits on the group A pages.
>>
>> Andreas
>>
>> -----------------------------------------------------------------------
>>
>> Andreas Prlic      Wellcome Trust Sanger Institute
>>                               Hinxton, Cambridge CB10 1SA, UK
>>              +44 (0) 1223 49 6891
>>
>> -----------------------------------------------------------------------
>>
>>
>>
>> --The Wellcome Trust Sanger Institute is operated by Genome
>> ResearchLimited, a charity registered in England with number 1021457 and
>> acompany registered in England with number 2742969, whose
>> registeredoffice is 215 Euston Road, London, NW1 2BE.
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.2.2 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
> 
> iD8DBQFG8if94C5LeMEKA/QRAkZ7AJ0a2xaU717XFfrX4eCc/wmPN/OL2ACfZMHi
> U21o+ZfVD5XOqT1mR7STp6Q=
> =dct8
> -----END PGP SIGNATURE-----
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev



More information about the biojava-dev mailing list