[Biojava-l] Re: [Biojava-dev] Initial impressions...

Schreiber, Mark mark.schreiber at agresearch.co.nz
Tue Jul 8 12:30:28 EDT 2003


Hi Len,
 
Glad to hear you are finding BioJava and BJIA useful. I will put up a tutorial on characters to Symbols shortly in the meantime have a look at the forSymbol() and dnaToken() methods of DNATools for convenience methods to tokenize DNA.
 
Biologists tend to use lower case for DNA and uppercase for Protein, BioJava is case insensitive (at least for DNA and RNA and, I think, protein). You could modify your AlphabetManager.xml and it would probably work (due to DNA tokenization being case insensitive) but I wouldn't reccomend it, strange things may happen, if not now then possibly later, especially if you try and play across a remote connection. The best thing to do might be to write your own tokenizer and use that when writing DNA. The only downside to that is that you won't be able to use some of the conveneince methods from the tools classes as they use the default tokenizers. You could always write your own convenience methods though, MySeqIOTools for example.
 
The BioSQL schema in its latest incarnation (BioSQL 1.0 or the Singapore schema) should be able to handle Taxonomy stuff. This schema is supported in biojava-live, the older schema is supported by biojava 1.30 and I don't know how well it handled Taxon data (not well I recall).
 
- Mark
 

	-----Original Message----- 
	From: Len Trigg [mailto:len at reeltwo.com] 
	Sent: Tue 8/07/2003 9:17 a.m. 
	To: biojava-l at biojava.org 
	Cc: 
	Subject: [Biojava-l] Re: [Biojava-dev] Initial impressions...
	
	


	Matthew Pocock wrote: 
	> We need to make this process much easier. Unfortunately, getAsChar() 
	> doesn't realy work for us because we can have symbols for things that 
	> don't have a single char representation, such as codons. However, you 
	> shouldn't have to end up going through 20 function calls either. 
	> 
	> Is there a biojava in anger example of geting letters from symbols? 

	Nope, not that I could see. BTW, the BioJava in Anger is a very 
	helpful document, I've been consulting it often :-). Sounds like this 
	would make a good addition to the "how do I get between strings and 
	symbols" section. 

	On a related note, biojava seems to always use lowercase when writing 
	out DNA sequences. Is there an officially endorsed method for 
	switching to upper case? Should I modify my AlphabetManager.xml, or 
	should I reregister a new CharacterTokenization with the name "token" 
	so that it overrides the default one and gets picked by the various 
	output formats? 


	> > Parsing a BLAST output file was also easy, however, I had to use 
	> > "lazy" mode to work with our files (from NCBI BLAST 2.2.1), and I have 
	> > not yet figured out how to extract a) the length of the query 
	> > sequence, and b) the frame of the hits. Any suggestions here? 
	> 
	> Is that information in the annotation attached to the 
	> SeqSimilaritySearchSubHit or the SeqSimilritySearchResult? 

	When I print out all the annotations (basically using the BIA example 
	BlastParser.java, modified to include sub hit information), I see that 
	the queryFrame is present, but the query length information is not. 



	> Good luck with BioSQL and GFF. These are parts of the library that I use 
	> daily. Oh, and for the GFF, start off by using GFFTools. 

	I've written some sequences, annotated from GFF files to a mysql 
	database using BioSQL, and it worked great! Does the BioJava code 
	support writing taxonomy information to the database, so I can link my 
	sequences to species? 


	(I've moved this to biojava-l, since this seems more of a biojava-l 
	question than biojava-dev question, although with open source class 
	libraries, the line often seems to get blurred :-)) 

	Cheers, 
	Len. 
	_______________________________________________ 
	Biojava-l mailing list  -  Biojava-l at biojava.org 
	http://biojava.org/mailman/listinfo/biojava-l 


=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================



More information about the Biojava-l mailing list