[Biojava-dev] bjv2 alpha 2

Matthew Pocock matthew.pocock at ncl.ac.uk
Fri May 14 11:20:02 EDT 2004


Hi,

I've just rolled out bjv2 alpha 2 - shelob. This is both a feature and 
performance enhancement release. Get it from subversion at:

http://www.derkholm.net/svn/repos/bjv2/branches/shelob

As always, the development version is at:

http://www.derkholm.net/svn/repos/bjv2/trunk

features:

  * gff support
  * guts for allowing both gff and to be viewed as a stream of features 
or sequences
  * schema support on queryables

performance:

  * elide away unnecisary object creation - 10x speed improvement

  * adaptive indexing - data sources work out what questions you ask & 
build indexes

  * starting to do query optimization through the query and integration 
layers - 320 sec query down to 6 sec!! I think this has replaced an n*n 
scaling with a log(n) scaling. Pure objects would be constant-time 
though - still some way to go

miscelanei:

  * now requires the /newest/ javac - the javac bundled with java 1.5 
beta1 was buggy
  * more documentation - design and user docs

An example script:

import org.bjv2.seq.Sequence;
import org.bjv2.seq.Sequences;
import org.bjv2.seq.io.IO;
import org.bjv2.symbol.SymbolList;
import org.bjv2.gql.Queryable;

import java.io.File;

/**
 * Demonstration of integrating multiple files.
 * <p/>
 * Use: <pre>IntegrateSequences seqFile1, seqFile2, ...</pre>
 * <p/>
 * The files can be any biological sequence/feature format files - 
currently gff & embl are supported.
 * The output will be a list of all sequences, and the number of 
features on the sequence, regardless of
 * whether the feature was annotated in the same file that the sequence 
was defined in.
 *
 * @author Matthew Pocock
 */
public class IntegrateSequences
{
  public static void main(String[] args)
          throws Throwable
  {
    // load all the data in
    for(String arg: args) {
      File seqFile = new File(arg);
      System.out.println("Loading: " + seqFile);
      IO.loadSequence(seqFile);
    }

    System.out.println("All sequences: ");

    // get the queryable with all the sequence data in
    // this will be made prettier for alpha3
    Queryable<Sequence> allSeqs = (Queryable<Sequence>)
            Sequences.defaultContext().getMapping().image(
                    
Sequences.getIdentifiers().get(Sequences.Domains.SEQUENCE));
    System.out.println("\t" + allSeqs);

    // loop over all sequences, printing out the sequence length & the 
number of features
    for(Sequence seq: allSeqs) {
      System.out.println("\t" + seq.getIdentifier());
      SymbolList symL = seq.getSymbolList();
      if(symL != null) {
        System.out.println("\t\tlength: " + symL.length());
      }
      System.out.println("\t\tfeatures: " + seq.getFeatures().size());
    }
  }
}



More information about the biojava-dev mailing list