[Biojava-dev] The future of BioJava

Andreas Prlic ap3 at sanger.ac.uk
Wed Sep 19 17:33:57 UTC 2007


Hi,

A question related to the discussion of how to design a future  
BioJava is to have a look
at which parts of BioJava are being actively used and how to improve  
these.

So what are the most frequently used bits of BioJava? One way to look  
at this is to go to the
web-stats and see how many hits we have got on our documentation web  
pages.

In an ideal world BioJava would be so simple to use, that nobody  
needs to read any docu.
Unfortunately we are far away from this, so actually looking at these  
stats gives an impression
on

* topics / functionality which are of particular interest to the  
community
* topics / functionality which might not be straightforward to use,  
therefore there are many hits on these pages.

A look at the webstats from the last couple of months gives these top  
10 Cookbook pages that
have been accessed frequently. This list is ordered by nr. of  pageviews

1. /wiki/BioJava:Cookbook:Alphabets
2. /wiki/BioJava:CookBook:Blast:Parser
3. /wiki/BioJava:Cookbook:SeqIO:ReadFasta
4. /wiki/BioJava:Cookbook:SeqIO:ReadGES
5. /wiki/BioJava:CookBook:DP:PairWise2
6. /wiki/BioJava:CookBook:PDB:read
7. /wiki/BioJava:Cookbook:Sequence
8. /wiki/BioJava:Cookbook:SeqIO:WriteInFasta
9. /wiki/BioJava:CookBook:Interfaces:ViewInGUI
10. /wiki/BioJava:CookBook:Fasta:Parse

I would group these pages into 2 groups.
A) How to work with core concepts of BioJava
B) How to use a functionality of BioJava to achieve a certain goal

The "conceptual" pages (A) I would identify as
* How to get an Alphabet
* How to make a Sequence Object from a String or make a Sequence  
Object back into a String

The "functionality"  pages (B) I would summarize as
* How to parse a Blast output
* How to read sequences from a Fasta file
* How to read a GenBank, SwissProt or EMBL file
* How to generate a global or local alignment with the Needleman- 
Wunsch- or the Smith-Waterman-algorithm
* How to read a protein structure - PDB file
* How to export a sequence to fasta
* How to view a sequence in a gui
* How to parse a Fasta database search output file


As a conclusion I would suggest that BioJava should have the goal to  
provide easy access to the
core "functionalities" (group B).  I believe that we should try to  
keep the "concepts" that are being used to
achieve these functionalities as simple as possible. In this sense, I  
feel that we have too many hits on the group A pages.

Andreas

-----------------------------------------------------------------------

Andreas Prlic      Wellcome Trust Sanger Institute
                               Hinxton, Cambridge CB10 1SA, UK
			 +44 (0) 1223 49 6891

-----------------------------------------------------------------------



-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 



More information about the biojava-dev mailing list