[Biojava-l] Ensembl export data

Matthew Pocock matthew_pocock@yahoo.co.uk
Thu, 21 Nov 2002 13:22:17 +0000 (GMT)


Hi Stein Arts,

You can do this using BioJava. You will require
ensembl-j.jar (ensembl object-relational layer) and
bj-ensembl.java (ensembl <-> biojava bridge). These
may be available for download, but you can defintiely
build them from source (the ensembl and
biojava-ensembl cvs packages respectively). Also, you
will need to put the mysql jdbc driver in your path
and follow whatever instructions it gives for loading
the driver class

Once you have these jars and biojava.jar in your
classpath, you can type code something like this,
replacing my hackey values with the propper ones (like
the db URL, name & password, and your interesting
transcript):


DatabaseAdaptor dba = SQLDatabaseAdaptor.connectSQL(
 
jdbc:org.mysql.driver:kaka.sanger.ac.uk/homo_sapiens_core_8_30c"
  "username",
  "password" );

Ensembl ensembl = new Ensembl(dba);

SeqDB chromosomes = ensembl.getChromosomes();

// dump out a region of chrom 7
Sequence chrom7 = chromosomes.getSequence("7");
Sequence ourBit = SequenceTools.subSequence(start,
end, "our name for this bit");
SeqIOTools.writeEmbl(new
fileOutputStream("file.embl"), ourBit);

// oops - we wanted the other strand
ourBit = SequenceTools.reverseComplement(ourBit);
SeqIOTools.writeEmbl(new
fileOutputStream("fileReverse.embl"), ourBit);


// now we have a transcript ID, and want to write
// that region out as genbank
FeatureHolder ts = chromosomes.filter(
  FilterUtils.hasAnnotation("ensembl.id",
"ENST00012345") );
if(ts.countFeatures() == 0) {
  throw new NoSuchElementException("Could not find
transcript");
}

StrandedFeature trans = (StrandedFeature)
ts.features().next();
Location transLoc = trans.getLocation();
Location toDump = new RangeLocation(
  transLoc.getMin() - 100,
  transLoc.getMax() + 100 );
Sequence seq = SequenceTools.subSequence(
  transLoc.getSequence(),
  transLoc.getMin(),
  transLoc.getMax(),
  transLoc.getAnnotation().getProperty("ensembl.id")
);
if(transLoc.getStrand() == StrandedFeature.NEGATIVE) {
  seq = SequenceTools.reverseComplement(seq);
}

SeqIOTools.writeGenbank(
  new FileOutputStream(seq.getName() + ".gb"),
  seq );

I (as usual) haven't run any of this code through a
compiler, so have fun debugging it. It /should/ do
enough of what you want to allow you to fix it up.

If you need any further help, getting the cvs modules
down, or getting your script to behave or just
figuring out what import statements are needed, then
post back to the list.

Best,

Matthew

 --- Stein Aerts <stein.aerts@tijd.com> wrote: > 
> Hi, I have another question related to the previous
> one.
> 
> Would somebody have a code example of  how to use
> BioJava and/or the 
> BioJava packages of Ensembl, to perform an "export
> data" function (based 
> only on the ensembl id, with flanking base pairs),
> by directly using the 
> kaka.sanger.ac.uk database?
> 
> Thank you!
> Stein Aerts.
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l 

__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com