[Biojava-l] RichSequence annotations...
Jolyon Holdstock
jolyon.holdstock at ogt.co.uk
Fri Mar 24 11:26:44 UTC 2006
Hi,
I use the following code to extract all the genes from a sequence file;
I load the sequence then filter out only CDS features; iterating through
these lets me get the gene annotation for the feature
//======================================================================
=========
Sequence seq;
String fileName = new
File("C:/Scripts/Java/BioJava/BioJavaX/biojava-live/demos/seq/AL121903.e
mbl");
try {
seq = SeqIOTools.readEmbl(new BufferedReader(new
FileReader(fileName))).nextSequence();
}
catch (IOException IOE) {
System.out.println("IOException " + IOE);
}
catch (BioException BIOE) {
System.out.println("BioException " + BIOE);
}
//Create a feature filter for CDS features only
FeatureFilter ff = new FeatureFilter.ByType("CDS");
//Get the filtered Features
FeatureHolder fh = seq.filter(ff);
//Iterate over the Features in fh
for (Iterator i = fh.features(); i.hasNext(); ) {
Feature f = (Feature)i.next();
Annotation annotation = f.getAnnotation();
Object key = "gene";
hash.put(annotation.getProperty(key), f);
}
//======================================================================
=========
I am now using the new BioJavaX classes which I cannot get to work. Does
anyone has any pointers for this?
I use the sequence data so have to use a RichSequence rather than a
BioEntry
//======================================================================
=========
RichSequence richSeq;
String fileName =
"C:/Scripts/Java/BioJava/BioJavaX/biojava-live/demos/seq/AL121903.embl";
try {
richSeq = RichSequence.IOTools.readEMBLDNA(new BufferedReader(new
FileReader(fileName)), null).nextRichSequence();
}
catch (IOException IOE) {
System.out.println("IOException " + IOE);
}
catch (BioException BIOE) {
System.out.println("BioException " + BIOE);
}
//Create a feature filter for CDS features only
FeatureFilter ff = new FeatureFilter.ByType("CDS");
//Get the filtered Features
FeatureHolder fh = richSeq.filter(ff);
//Iterate through the features
for (Iterator i = fh.features(); i.hasNext(); ) {
RichFeature rf = (RichFeature) i.next();
System.out.println("RichFeature: " + rf.toString());
RichAnnotation ra = (RichAnnotation) rf.getAnnotation();
System.out.println("RichAnnotation: " + ra.toString());
}
//======================================================================
=========
The output shows that CDS features have been filtered successfully and
that the gene name is in the annotation
RichFeature: (#1)
lcl:HSDJ155G6/AL121903.13:CDS,EMBL(biojavax:join:[<5642..5793,10804..109
76,12496..12656,14136..14266])
RichAnnotation: [(#2) biojavax:clone_lib: RPCI-1"
14403..14532,16852..16987,17821..17959,18068..18122,
19456..19570,23623..23753,25885..26053,29102..29240,
32621..32738,33595..33771],[(#3) biojavax:codon_start: 1],[(#4)
biojavax:evidence: NOT_EXPERIMENTAL],[(#5) biojavax:note: match:
proteins: Tr:Q9Y6D5 Tr:O46382 Tr:Q9Y6D6],[(#6) biojavax:gene:
dJ155G6.1],[(#7) biojavax:product: dJ155G6.1 (brefeldin A-inhibited
guanine
nucleotide-exchange protein 2)],[(#8) biojavax:protein_id: CAB86643.1]
If I add the following then I can see what keys are in the annotation
//======================================================================
=========
Set keySet = ra.keys();
for (Iterator it = keySet.iterator(); it.hasNext(); ) {
String key = it.next().toString();
System.out.println("Key: " + key);
}
//======================================================================
=========
The output shows that there is a gene
Key: biojavax:clone_lib
Key: biojavax:codon_start
Key: biojavax:evidence
Key: biojavax:gene
Key: biojavax:note
Key: biojavax:product
Key: biojavax:protein_id
My understanding is that I need to use a ComparableTerm to access the
value but when I create it I get a NoSuchElementException error
ComparableTerm gene =
RichObjectFactory.getDefaultOntology().getOrCreateTerm("gene");
System.out.println("Gene: " + ra.getProperty(gene));
java.util.NoSuchElementException: No such property: biojavax:gene, rank
0
cheers,
Jolyon
Jolyon Holdstock Ph.D.
Senior Computational Biologist,
Oxford Gene Technology (Ops) Ltd.
Begbroke Business and Science Park
Sandy Lane, Yarnton
Oxford, OX5 1PF
Tel: 01865 309699
Fax: 01865 842116
Confidentiality Notice:
The contents of this email from the Oxford Gene Technology Group of
Companies are confidential and intended solely for the person to whom it
is addressed. It may contain privileged and confidential information. If
you are not the intended recipient you must not read, copy, distribute,
discuss or take any action in reliance on it.
More information about the Biojava-l
mailing list