From mark.schreiber at novartis.com Fri Jul 1 04:21:50 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Fri Jul 1 04:14:08 2005 Subject: [Biojava-l] New look for BJIA Message-ID: Hello - I'm experimenting with a new style sheet for biojava in anger. Currently the home page (http://www.biojava.org/docs/bj_in_anger/index.htm) and one example use the style sheet (http://www.biojava.org/docs/bj_in_anger/blastecho.htm). The look of the main page is essentially the same (but the html is much nicer than it was). The example looks a bit different, the HTML for the code was generated by NetBeans. It looks reasonable on Firefox and IE5. Please let me know if it is unreadable on other browsers. Over the next few months (years?) the rest of the pages will be updated as I have time and feel motivated. Help from volunteers would be greatly appreciated : ) - Mark Mark Schreiber Principal Scientist (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 From kvddrift at earthlink.net Fri Jul 1 06:40:26 2005 From: kvddrift at earthlink.net (Koen van der Drift) Date: Fri Jul 1 06:32:31 2005 Subject: [Biojava-l] New look for BJIA In-Reply-To: References: Message-ID: <46b6bbb35a390fdff68326010d3f4df9@earthlink.net> On Jul 1, 2005, at 4:21 AM, mark.schreiber@novartis.com wrote: > It looks reasonable on Firefox and IE5. Please let me know if it is > unreadable on other browsers. > Looks good on Safari (Mac OS X) too. I can highly recommend to add a doctype line at the top of the html document to tell the browsers which html form they are dealing with. For instance: . By using proper html, browsers will less likely render your page differently. See also - Koen. From foisys at sympatico.ca Fri Jul 1 11:04:10 2005 From: foisys at sympatico.ca (Sylvain Foisy) Date: Fri Jul 1 10:54:04 2005 Subject: [Biojava-l] New look for BJIA Message-ID: HI Mark, How about the L&F of the Biojava site? I ported the french version some time ago and I do have some of the english pages ported too. If there is a volunteer to help me out, I could finish this by end of July. Best regards Sylvain =================================================================== Sylvain Foisy, Ph. D. Directeur - operations / Project Manager BioneQ - Reseau quebecois de bio-informatique U. de Montreal / Genome-Quebec Adresse postale: Departement de biochimie Pavillon principal 2900, boul. ?douard-Montpetit Montr?al (Qu?bec) H3T 1J4 Tel: (514) 343-6111 x.2545 Fax: (514) 343-7759 Courriel: sylvain.foisy@bioneq.qc.ca =================================================================== From mark.schreiber at novartis.com Sun Jul 3 21:04:39 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Sun Jul 3 20:55:46 2005 Subject: [Biojava-l] New look for BJIA Message-ID: The problem I have with the L&F of the BioJava site is that the 'side bar' appears to be hard coded into the pages. This was bad enough when I tried to update recently and had to change 8 pages. I wouldn't want to do all of BJIA as well. It appears server side includes are not enabled on portal.open-bio.org. You seem to be using the same code HTML generation (netbeans?). One other design question.... Do people find the line numbers a help or a hinderance? On the up side you can say, line 8 has a bug in it but the big draw back is you cannot easily cut and paste the examples into your IDE or Emacs/Vi. Should I keep numbers or drop them? - Mark Sylvain Foisy Sent by: biojava-l-bounces@portal.open-bio.org 07/01/2005 11:04 PM To: cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] New look for BJIA HI Mark, How about the L&F of the Biojava site? I ported the french version some time ago and I do have some of the english pages ported too. If there is a volunteer to help me out, I could finish this by end of July. Best regards Sylvain =================================================================== Sylvain Foisy, Ph. D. Directeur - operations / Project Manager BioneQ - Reseau quebecois de bio-informatique U. de Montreal / Genome-Quebec Adresse postale: Departement de biochimie Pavillon principal 2900, boul. ?douard-Montpetit Montr?al (Qu?bec) H3T 1J4 Tel: (514) 343-6111 x.2545 Fax: (514) 343-7759 Courriel: sylvain.foisy@bioneq.qc.ca =================================================================== _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From hollandr at gis.a-star.edu.sg Sun Jul 3 23:11:10 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Sun Jul 3 23:03:36 2005 Subject: [Biojava-l] New look for BJIA Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601DCB0C3@BIONIC.biopolis.one-north.com> I'd prefer it without line numbers as I do a lot of cut-and-pasting. Maybe there is some way of adding them without getting them included in the actual text (as images, for example). Or simply alternating background colours on alternate lines of code would have a similar effect for legibility. Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: biojava-l-bounces@portal.open-bio.org > [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of > mark.schreiber@novartis.com > Sent: Monday, July 04, 2005 9:05 AM > To: Sylvain Foisy > Cc: biojava-l@open-bio.org > Subject: Re: [Biojava-l] New look for BJIA > > > The problem I have with the L&F of the BioJava site is that > the 'side bar' > appears to be hard coded into the pages. This was bad enough > when I tried > to update recently and had to change 8 pages. I wouldn't want > to do all of > BJIA as well. It appears server side includes are not enabled on > portal.open-bio.org. > > You seem to be using the same code HTML generation (netbeans?). > > One other design question.... > > Do people find the line numbers a help or a hinderance? On > the up side you > can say, line 8 has a bug in it but the big draw back is you > cannot easily > cut and paste the examples into your IDE or Emacs/Vi. Should I keep > numbers or drop them? > > - Mark > > > > > > Sylvain Foisy > Sent by: biojava-l-bounces@portal.open-bio.org > 07/01/2005 11:04 PM > > > To: > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-l] New look for BJIA > > > HI Mark, > > How about the L&F of the Biojava site? I ported the french > version some > time > ago and I do have some of the english pages ported too. If there is a > volunteer to help me out, I could finish this by end of July. > > Best regards > > Sylvain > > =================================================================== > Sylvain Foisy, Ph. D. > Directeur - operations / Project Manager > BioneQ - Reseau quebecois de bio-informatique > U. de Montreal / Genome-Quebec > > Adresse postale: > > Departement de biochimie > Pavillon principal > 2900, boul. ?douard-Montpetit > Montr?al (Qu?bec) H3T 1J4 > > Tel: (514) 343-6111 x.2545 > Fax: (514) 343-7759 > Courriel: sylvain.foisy@bioneq.qc.ca > =================================================================== > > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > From mark.schreiber at novartis.com Sun Jul 3 23:18:29 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Sun Jul 3 23:09:49 2005 Subject: [Biojava-l] New look for BJIA Message-ID: I thought about doing the line numbers in another frame but getting them to align is not always reliable across browsers. I also favour cutting and pasting over line numbers so I may go ahead and drop them. - Mark "Richard HOLLAND" Sent by: biojava-l-bounces@portal.open-bio.org 07/04/2005 11:11 AM To: Mark Schreiber/GP/Novartis@PH cc: biojava-l@open-bio.org Subject: RE: [Biojava-l] New look for BJIA I'd prefer it without line numbers as I do a lot of cut-and-pasting. Maybe there is some way of adding them without getting them included in the actual text (as images, for example). Or simply alternating background colours on alternate lines of code would have a similar effect for legibility. Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: biojava-l-bounces@portal.open-bio.org > [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of > mark.schreiber@novartis.com > Sent: Monday, July 04, 2005 9:05 AM > To: Sylvain Foisy > Cc: biojava-l@open-bio.org > Subject: Re: [Biojava-l] New look for BJIA > > > The problem I have with the L&F of the BioJava site is that > the 'side bar' > appears to be hard coded into the pages. This was bad enough > when I tried > to update recently and had to change 8 pages. I wouldn't want > to do all of > BJIA as well. It appears server side includes are not enabled on > portal.open-bio.org. > > You seem to be using the same code HTML generation (netbeans?). > > One other design question.... > > Do people find the line numbers a help or a hinderance? On > the up side you > can say, line 8 has a bug in it but the big draw back is you > cannot easily > cut and paste the examples into your IDE or Emacs/Vi. Should I keep > numbers or drop them? > > - Mark > > > > > > Sylvain Foisy > Sent by: biojava-l-bounces@portal.open-bio.org > 07/01/2005 11:04 PM > > > To: > cc: (bcc: Mark Schreiber/GP/Novartis) > Subject: [Biojava-l] New look for BJIA > > > HI Mark, > > How about the L&F of the Biojava site? I ported the french > version some > time > ago and I do have some of the english pages ported too. If there is a > volunteer to help me out, I could finish this by end of July. > > Best regards > > Sylvain > > =================================================================== > Sylvain Foisy, Ph. D. > Directeur - operations / Project Manager > BioneQ - Reseau quebecois de bio-informatique > U. de Montreal / Genome-Quebec > > Adresse postale: > > Departement de biochimie > Pavillon principal > 2900, boul. ?douard-Montpetit > Montr?al (Qu?bec) H3T 1J4 > > Tel: (514) 343-6111 x.2545 > Fax: (514) 343-7759 > Courriel: sylvain.foisy@bioneq.qc.ca > =================================================================== > > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From hollandr at gis.a-star.edu.sg Sun Jul 3 23:39:36 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Sun Jul 3 23:31:49 2005 Subject: [Biojava-l] New look for BJIA Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601DCB0C8@BIONIC.biopolis.one-north.com> Sneaky trick that might just work : make the code listing a numbered list, then use CSS to format it so that the numbers are nicely justified. When you copy-paste a numbered list the numbers don't get included! Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: mark.schreiber@novartis.com > [mailto:mark.schreiber@novartis.com] > Sent: Monday, July 04, 2005 11:18 AM > To: Richard HOLLAND > Cc: biojava-l@open-bio.org; biojava-l-bounces@portal.open-bio.org > Subject: RE: [Biojava-l] New look for BJIA > > > I thought about doing the line numbers in another frame but > getting them > to align is not always reliable across browsers. > > I also favour cutting and pasting over line numbers so I may > go ahead and > drop them. > > - Mark > > > > > > "Richard HOLLAND" > Sent by: biojava-l-bounces@portal.open-bio.org > 07/04/2005 11:11 AM > > > To: Mark Schreiber/GP/Novartis@PH > cc: biojava-l@open-bio.org > Subject: RE: [Biojava-l] New look for BJIA > > > I'd prefer it without line numbers as I do a lot of > cut-and-pasting. Maybe > there is some way of adding them without getting them included in the > actual text (as images, for example). Or simply alternating > background > colours on alternate lines of code would have a similar effect for > legibility. > > Richard Holland > Bioinformatics Specialist > GIS extension 8199 > --------------------------------------------- > This email is confidential and may be privileged. If you are not the > intended recipient, please delete it and notify us > immediately. Please do > not copy or use it for any purpose, or disclose its content > to any other > person. Thank you. > --------------------------------------------- > > > > -----Original Message----- > > From: biojava-l-bounces@portal.open-bio.org > > [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of > > mark.schreiber@novartis.com > > Sent: Monday, July 04, 2005 9:05 AM > > To: Sylvain Foisy > > Cc: biojava-l@open-bio.org > > Subject: Re: [Biojava-l] New look for BJIA > > > > > > The problem I have with the L&F of the BioJava site is that > > the 'side bar' > > appears to be hard coded into the pages. This was bad enough > > when I tried > > to update recently and had to change 8 pages. I wouldn't want > > to do all of > > BJIA as well. It appears server side includes are not enabled on > > portal.open-bio.org. > > > > You seem to be using the same code HTML generation (netbeans?). > > > > One other design question.... > > > > Do people find the line numbers a help or a hinderance? On > > the up side you > > can say, line 8 has a bug in it but the big draw back is you > > cannot easily > > cut and paste the examples into your IDE or Emacs/Vi. Should I keep > > numbers or drop them? > > > > - Mark > > > > > > > > > > > > Sylvain Foisy > > Sent by: biojava-l-bounces@portal.open-bio.org > > 07/01/2005 11:04 PM > > > > > > To: > > cc: (bcc: Mark Schreiber/GP/Novartis) > > Subject: [Biojava-l] New look for BJIA > > > > > > HI Mark, > > > > How about the L&F of the Biojava site? I ported the french > > version some > > time > > ago and I do have some of the english pages ported too. If > there is a > > volunteer to help me out, I could finish this by end of July. > > > > Best regards > > > > Sylvain > > > > =================================================================== > > Sylvain Foisy, Ph. D. > > Directeur - operations / Project Manager > > BioneQ - Reseau quebecois de bio-informatique > > U. de Montreal / Genome-Quebec > > > > Adresse postale: > > > > Departement de biochimie > > Pavillon principal > > 2900, boul. ?douard-Montpetit > > Montr?al (Qu?bec) H3T 1J4 > > > > Tel: (514) 343-6111 x.2545 > > Fax: (514) 343-7759 > > Courriel: sylvain.foisy@bioneq.qc.ca > > =================================================================== > > > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l@biojava.org > > http://biojava.org/mailman/listinfo/biojava-l > > > > > > > > > > _______________________________________________ > > Biojava-l mailing list - Biojava-l@biojava.org > > http://biojava.org/mailman/listinfo/biojava-l > > > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > > > > From hollandr at gis.a-star.edu.sg Mon Jul 4 01:33:41 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Mon Jul 4 01:26:09 2005 Subject: [Biojava-l] memory leak while reading nr.fasta Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601DCB0DE@BIONIC.biopolis.one-north.com> This is one big problem, and I've come across it before. SeqIOTools.fileToBiojava reads the whole file in at once and stores everything in memory as Sequence objects in a virtual sequence database. For a file the size of nr, this is simply impossible on most machines, and causes out-of-memory exceptions. What is required for files this size is a SeqIOTools parser that reads sequence objects _on demand_ as requested by the iterator, rather than reading the whole lot at once. This way it can drop sequence objects once they have been passed over by the iterator, freeing up memory for subsequent ones (assuming the client app keeps no references to them either). How this fits in with BioJava's "everything is a sequence database" philosophy or not I don't know, as essentially it breaks it by defining a file to be a sequential-access sequence database, rather than a random-access one. Can someone clarify if a lazy-loading parser/database implementation already exists for situations like this, or does one need to be written? cheers, Richard Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: biojava-l-bounces@portal.open-bio.org > [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of Gem Yang > Sent: Friday, July 01, 2005 2:30 AM > To: biojava-l@biojava.org > Subject: [Biojava-l] memory leak while reading nr.fasta > > > Hi, > > I am new to Biojava. > I have the following program, which is copied from ReadFaster2 in the > cookbook. > > public static void main(String[] args) { > try { > // args[0] is nr.fasta > BufferedReader br = new BufferedReader(new > FileReader(args[0])); > > String format = "FASTA"; > String alphabet = "PROTEIN"; > > SequenceIterator iter = > quenceIterator)SeqIOTools.fileToBiojava(format,alphabet, br); > > int count =0; > long start = System.currentTimeMillis(); > while(iter.hasNext()) > { > Sequence s = iter.nextSequence(); > String name = s.getName(); > > //System.out.println(name); > s.getAnnotation(); > //System.out.println(s.seqString()); > count ++; > System.out.println(count); > > } > long end = System.currentTimeMillis(); > System.out.println("number of sequence " + count); > System.out.println("time used" + (end-start)/1000 + > "seconds"); > System.out.println((end-start)/1000/60 + "minutes"); > } > catch (FileNotFoundException ex) { > //can't find file specified by args[0] > ex.printStackTrace(); > }catch (BioException ex) { > //error parsing requested format > ex.printStackTrace(); > } > } > > When running this code, I got out of memory error in about > half an hour and > 1.5GB memory allocated. My workstation is a Windows XP with > 2 GB of memory. > My biojava version is 1.3. My JRE is one came with Websphere > application > developer. > > Thanks. > Gem > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > From mark.schreiber at novartis.com Mon Jul 4 01:46:55 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Mon Jul 4 01:38:05 2005 Subject: [Biojava-l] memory leak while reading nr.fasta Message-ID: It is supposed to only read on demand. Are you sure it isn't?? As long as you don't keep references to the individual sequences they should be destroyed by the garbage collector. If there is a real memory leak something must be keeping references to them but this is not the intended behaivour. This would be a serious bug. A while back there was a problem with change listeners not getting disposed of. I thought this was resolved but possibly it was not. Would need an example to track this down. - Mark "Richard HOLLAND" Sent by: biojava-l-bounces@portal.open-bio.org 07/04/2005 01:33 PM To: cc: Gem Yang , (bcc: Mark Schreiber/GP/Novartis) Subject: RE: [Biojava-l] memory leak while reading nr.fasta This is one big problem, and I've come across it before. SeqIOTools.fileToBiojava reads the whole file in at once and stores everything in memory as Sequence objects in a virtual sequence database. For a file the size of nr, this is simply impossible on most machines, and causes out-of-memory exceptions. What is required for files this size is a SeqIOTools parser that reads sequence objects _on demand_ as requested by the iterator, rather than reading the whole lot at once. This way it can drop sequence objects once they have been passed over by the iterator, freeing up memory for subsequent ones (assuming the client app keeps no references to them either). How this fits in with BioJava's "everything is a sequence database" philosophy or not I don't know, as essentially it breaks it by defining a file to be a sequential-access sequence database, rather than a random-access one. Can someone clarify if a lazy-loading parser/database implementation already exists for situations like this, or does one need to be written? cheers, Richard Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: biojava-l-bounces@portal.open-bio.org > [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of Gem Yang > Sent: Friday, July 01, 2005 2:30 AM > To: biojava-l@biojava.org > Subject: [Biojava-l] memory leak while reading nr.fasta > > > Hi, > > I am new to Biojava. > I have the following program, which is copied from ReadFaster2 in the > cookbook. > > public static void main(String[] args) { > try { > // args[0] is nr.fasta > BufferedReader br = new BufferedReader(new > FileReader(args[0])); > > String format = "FASTA"; > String alphabet = "PROTEIN"; > > SequenceIterator iter = > quenceIterator)SeqIOTools.fileToBiojava(format,alphabet, br); > > int count =0; > long start = System.currentTimeMillis(); > while(iter.hasNext()) > { > Sequence s = iter.nextSequence(); > String name = s.getName(); > > //System.out.println(name); > s.getAnnotation(); > //System.out.println(s.seqString()); > count ++; > System.out.println(count); > > } > long end = System.currentTimeMillis(); > System.out.println("number of sequence " + count); > System.out.println("time used" + (end-start)/1000 + > "seconds"); > System.out.println((end-start)/1000/60 + "minutes"); > } > catch (FileNotFoundException ex) { > //can't find file specified by args[0] > ex.printStackTrace(); > }catch (BioException ex) { > //error parsing requested format > ex.printStackTrace(); > } > } > > When running this code, I got out of memory error in about > half an hour and > 1.5GB memory allocated. My workstation is a Windows XP with > 2 GB of memory. > My biojava version is 1.3. My JRE is one came with Websphere > application > developer. > > Thanks. > Gem > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From darling at cs.wisc.edu Mon Jul 4 02:35:17 2005 From: darling at cs.wisc.edu (Aaron Darling) Date: Mon Jul 4 02:26:49 2005 Subject: [Biojava-l] Dealing with huge sequences (was: "memory leak while reading nr.fasta") In-Reply-To: <6D9E9B9DF347EF4385F6271C64FB8D5601DCB0DE@BIONIC.biopolis.one-north.com> References: <6D9E9B9DF347EF4385F6271C64FB8D5601DCB0DE@BIONIC.biopolis.one-north.com> Message-ID: <42C8D8A5.6020706@cs.wisc.edu> Richard HOLLAND wrote: >What is required for files this size is a SeqIOTools parser that reads >sequence objects _on demand_ as requested by the iterator, rather than >reading the whole lot at once. > This brings up a related issue that I'm grappling with at the moment... I would like to have biojava parse a large sequence file and then periodically extract arbitrary subsequences. As currently implemented, it seems that in order to extract a subsequence, the entire sequence entry must be loaded from the GenBank/FastA/whatever file into memory. This becomes a problem when dealing with large chromosomal data sets of the type displayed in the Mauve alignment viewer. Yes, I'm aware of the PackedSymbolList. Unfortunately, mammalian genomes are around 3 gigabases, requiring around 700MB each using a 2 bits per base encoding. Given that it won't be practical to store the entire sequence in memory, the next best solution would be keeping an in-memory index of relevant sequence file offsets. Enter BioJava's IndexStore. Unless I've misunderstood the documentation, the IndexStore family of classes index sequence files on a per-contig/per-entry basis. Such a scheme creates rather sparse indexes for chromosomes that can be > 100MB in length. What seems ideal would be an implementation of SeqIOTools that could read a GenBank/FastA file and construct a Sequence-derivative object with lazy references to the data. The Sequence-derived class would also need mappings of sequence coordinates to file offsets so that reading a 10 character subsequence n...n+10 doesn't require also reading subsequence 1...n-1. I implemented a similar scheme in a small c++ library called libGenome years ago and it makes manipulating large data sets a breeze. Echoing Richard's question for this slightly different problem: >Can someone clarify if a lazy-loading parser/database implementation >already exists for situations like this, or does one need to be written? > > > Thanks for Biojava, and thanks for any feedback -Aaron btw: I also brought this up at the BOSC biojava BOF but we were rather abruptly ushered out of the meeting room by an anxious hotel staffer prior to reaching a conclusion. From mark.schreiber at novartis.com Mon Jul 4 02:49:35 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Mon Jul 4 02:40:44 2005 Subject: [Biojava-l] Dealing with huge sequences (was: "memory leak while reading nr.fasta") Message-ID: I think this would be easily do-able with biojava. It would require a custom implementation of Sequence and, due to the beauty of interfaces you probably wouldn't even know you were dealing with an assembly, (except sometimes it might be a bit slow while collecting data). Like you say you could use IndexStore. I might also be worth looking at how Dazzle deals with DAS to see if you can steal anything from there. Ideally the SequenceBuilders called (eventually) by SeqIOTools should decide what kind of Sequence implementation you get back. For example, small sequences get SimpleSequence, mid sized get PackedSymbolList, and really large ones get some kind of lazy loaded sequence. Before diving in it would be interesting to know if it is the big sequence or the thousands of features that cause large sequences to be problematic. If it's features you would need to lazy load those as well (which could be problematic). - Mark Aaron Darling Sent by: biojava-l-bounces@portal.open-bio.org 07/04/2005 02:35 PM To: biojava-l@biojava.org, Paul Infield-Harm cc: (bcc: Mark Schreiber/GP/Novartis) Subject: Re: [Biojava-l] Dealing with huge sequences (was: "memory leak while reading nr.fasta") Richard HOLLAND wrote: >What is required for files this size is a SeqIOTools parser that reads >sequence objects _on demand_ as requested by the iterator, rather than >reading the whole lot at once. > This brings up a related issue that I'm grappling with at the moment... I would like to have biojava parse a large sequence file and then periodically extract arbitrary subsequences. As currently implemented, it seems that in order to extract a subsequence, the entire sequence entry must be loaded from the GenBank/FastA/whatever file into memory. This becomes a problem when dealing with large chromosomal data sets of the type displayed in the Mauve alignment viewer. Yes, I'm aware of the PackedSymbolList. Unfortunately, mammalian genomes are around 3 gigabases, requiring around 700MB each using a 2 bits per base encoding. Given that it won't be practical to store the entire sequence in memory, the next best solution would be keeping an in-memory index of relevant sequence file offsets. Enter BioJava's IndexStore. Unless I've misunderstood the documentation, the IndexStore family of classes index sequence files on a per-contig/per-entry basis. Such a scheme creates rather sparse indexes for chromosomes that can be > 100MB in length. What seems ideal would be an implementation of SeqIOTools that could read a GenBank/FastA file and construct a Sequence-derivative object with lazy references to the data. The Sequence-derived class would also need mappings of sequence coordinates to file offsets so that reading a 10 character subsequence n...n+10 doesn't require also reading subsequence 1...n-1. I implemented a similar scheme in a small c++ library called libGenome years ago and it makes manipulating large data sets a breeze. Echoing Richard's question for this slightly different problem: >Can someone clarify if a lazy-loading parser/database implementation >already exists for situations like this, or does one need to be written? > > > Thanks for Biojava, and thanks for any feedback -Aaron btw: I also brought this up at the BOSC biojava BOF but we were rather abruptly ushered out of the meeting room by an anxious hotel staffer prior to reaching a conclusion. _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From hollandr at gis.a-star.edu.sg Mon Jul 4 02:50:09 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Mon Jul 4 02:42:20 2005 Subject: [Biojava-l] Dealing with huge sequences (was: "memory leak whilereading nr.fasta") Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601DCB106@BIONIC.biopolis.one-north.com> I should probably mention some comments Mark made to me privately here - that the current fileToBiojava method _does_ read on demand, and sequentially, as opposed to buffered random access as I originally thought it did. The memory leak is in fact a mystery - I can't find any trace in the code to suggest that Biojava is holding internal references to Sequence objects read by fileToBiojava. The BJIA example _should_ work without any problems even on large files such as nr. Mark suggested a profiler would be useful. Does somebody have access to one? Apologies if I mislead anyone. Anyhow, on to Aaron's points... A lazy loading sequence object shouldn't be too much trouble at initial glance. It would (a) have to be aware of the file it came from, and (b) aware of the format of that file. It would also have to (c) store in memory each part that was loaded as we went along, unless otherwise told not to, to prevent duplicate reads where multiple accesses take place. This however is fundamentally different to the way files are currently parsed in BioJava. Not sure how it would actually work in reality. Any takers? Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: biojava-l-bounces@portal.open-bio.org > [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of > Aaron Darling > Sent: Monday, July 04, 2005 2:35 PM > To: biojava-l@biojava.org; Paul Infield-Harm > Subject: Re: [Biojava-l] Dealing with huge sequences (was: > "memory leak whilereading nr.fasta") > > > Richard HOLLAND wrote: > > >What is required for files this size is a SeqIOTools parser > that reads > >sequence objects _on demand_ as requested by the iterator, > rather than > >reading the whole lot at once. > > > This brings up a related issue that I'm grappling with at the > moment... > I would like to have biojava parse a large sequence file and then > periodically extract arbitrary subsequences. As currently > implemented, > it seems that in order to extract a subsequence, the entire sequence > entry must be loaded from the GenBank/FastA/whatever file > into memory. > This becomes a problem when dealing with large chromosomal > data sets of > the type displayed in the Mauve alignment viewer. Yes, I'm > aware of the > PackedSymbolList. Unfortunately, mammalian genomes are around 3 > gigabases, requiring around 700MB each using a 2 bits per > base encoding. > > Given that it won't be practical to store the entire sequence > in memory, > the next best solution would be keeping an in-memory index of > relevant > sequence file offsets. Enter BioJava's IndexStore. Unless I've > misunderstood the documentation, the IndexStore family of > classes index > sequence files on a per-contig/per-entry basis. Such a > scheme creates > rather sparse indexes for chromosomes that can be > 100MB in length. > What seems ideal would be an implementation of SeqIOTools that could > read a GenBank/FastA file and construct a Sequence-derivative object > with lazy references to the data. The Sequence-derived class > would also > need mappings of sequence coordinates to file offsets so that > reading a > 10 character subsequence n...n+10 doesn't require also reading > subsequence 1...n-1. I implemented a similar scheme in a small c++ > library called libGenome years ago and it makes manipulating > large data > sets a breeze. > > Echoing Richard's question for this slightly different problem: > > >Can someone clarify if a lazy-loading parser/database implementation > >already exists for situations like this, or does one need to > be written? > > > > > > > Thanks for Biojava, and thanks for any feedback > -Aaron > > btw: I also brought this up at the BOSC biojava BOF but we > were rather > abruptly ushered out of the meeting room by an anxious hotel staffer > prior to reaching a conclusion. > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > From sasata2 at yahoo.co.jp Tue Jul 5 12:24:49 2005 From: sasata2 at yahoo.co.jp (Takeshi Sasayama) Date: Tue Jul 5 12:17:38 2005 Subject: [Biojava-l] New look for BJIA Message-ID: <20050705162449.25044.qmail@web1808.mail.yahoo.co.jp> Hello Mark, How about putting a java source file and add a link to it? I saw some websites doing that. It would be easier if you have java source files and in case you don't need to drop line numbers. Takeshi Sasayama > Date: Mon, 4 Jul 2005 11:18:29 +0800 > From: mark.schreiber@novartis.com > Subject: RE: [Biojava-l] New look for BJIA > To: "Richard HOLLAND" > Cc: biojava-l@open-bio.org, biojava-l-bounces@portal.open-bio.org > Message-ID: > > > Content-Type: text/plain; charset="iso-8859-1" > > I thought about doing the line numbers in another frame but getting them > to align is not always reliable across browsers. > > I also favour cutting and pasting over line numbers so I may go ahead and > drop them. > > - Mark From bradford.powell at gmail.com Tue Jul 5 15:33:10 2005 From: bradford.powell at gmail.com (bradford powell) Date: Tue Jul 5 15:26:16 2005 Subject: [Biojava-l] field names in term_synonym Message-ID: <5418df3e050705123351d87be8@mail.gmail.com> org.biojava.bio.seq.db.biosql.OntologySQL refers to the 'name' field of the table 'term_synonym'. The field is only called 'name' in the hsqldb schema; it is called 'synonym' in the other schemas (postgresql, mysql, oracle). It seems that the references to 'name' should be changed to 'synonym' in lines 305 and 577 of OntologySQL, and that the hsqldb schema be corrected to follow the naming schema of the others. -- Bradford Powell From hollandr at gis.a-star.edu.sg Tue Jul 5 23:11:34 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Tue Jul 5 23:03:45 2005 Subject: [Biojava-l] field names in term_synonym Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601DCB220@BIONIC.biopolis.one-north.com> Hi - thanks for pointing that out. The field is also called 'name' in the Oracle schema as well as HSQLDB. Basically in Oracle you can't call a field 'synonym' because it is a reserved keyword. I'm thinking this might need to be a special-case where the SQL statement should be moved to the database-specific DBHelper class (HypersonicDBHelper, MySQLDBHelper, OracleDBHelper, etc.), as is done with references to the 'seq' column of the 'biosequence' table for instance. Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: biojava-l-bounces@portal.open-bio.org > [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of > bradford powell > Sent: Wednesday, July 06, 2005 3:33 AM > To: biojava-l@biojava.org > Subject: [Biojava-l] field names in term_synonym > > > org.biojava.bio.seq.db.biosql.OntologySQL refers to the 'name' field > of the table 'term_synonym'. The field is only called 'name' in the > hsqldb schema; it is called 'synonym' in the other schemas > (postgresql, mysql, oracle). > > It seems that the references to 'name' should be changed to 'synonym' > in lines 305 and 577 of OntologySQL, and that the hsqldb schema be > corrected to follow the naming schema of the others. > > -- Bradford Powell > > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > From aijaz_bio at yahoo.com Wed Jul 6 17:20:33 2005 From: aijaz_bio at yahoo.com (Aijazuddin Syed) Date: Wed Jul 6 17:12:55 2005 Subject: [Biojava-l] HMMER Message-ID: <20050706212033.67198.qmail@web30303.mail.mud.yahoo.com> Dear all, I was just wondering if I can perform HMMer searches through BioJava. I did try on www to find out but could not figure out. Actually I want to write a Java class to perform HMMer searches. Kind Regards, Aijaz. ____________________________________________________ Sell on Yahoo! Auctions ? no fees. Bid on great items. http://auctions.yahoo.com/ From felipe.albrecht at gmail.com Wed Jul 6 18:27:07 2005 From: felipe.albrecht at gmail.com (Felipe Albrecht) Date: Wed Jul 6 18:19:14 2005 Subject: [Biojava-l] Blast Format Writer Message-ID: There was some class in biojava that writes informations in ASN.1 format? Thanks. Felipe Albrecht From mark.schreiber at novartis.com Wed Jul 6 21:00:46 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Wed Jul 6 20:51:47 2005 Subject: [Biojava-l] HMMER Message-ID: Do you mean you want to make an HMM like HMMER or you want to launch HMMER processes from your JVM? - Mark Aijazuddin Syed Sent by: biojava-l-bounces@portal.open-bio.org 07/07/2005 05:20 AM Please respond to aijaz_bio To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] HMMER Dear all, I was just wondering if I can perform HMMer searches through BioJava. I did try on www to find out but could not figure out. Actually I want to write a Java class to perform HMMer searches. Kind Regards, Aijaz. ____________________________________________________ Sell on Yahoo! Auctions ? no fees. Bid on great items. http://auctions.yahoo.com/ _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From mark.schreiber at novartis.com Wed Jul 6 21:04:19 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Wed Jul 6 20:55:21 2005 Subject: [Biojava-l] Blast Format Writer Message-ID: No - The closest approximation would be GenBankXML (biojava can read it, but I don't think it can write it). Apparently GenBankXML is an abstraction of NCBIs ASN.1 If you want to do it you would need to implement SequenceFormat to read and write ASN.1 - Mark Felipe Albrecht Sent by: biojava-l-bounces@portal.open-bio.org 07/07/2005 01:27 AM Please respond to Felipe Albrecht To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Blast Format Writer There was some class in biojava that writes informations in ASN.1 format? Thanks. Felipe Albrecht _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From voisingreg at yahoo.fr Thu Jul 7 13:37:37 2005 From: voisingreg at yahoo.fr (gregory voisin) Date: Thu Jul 7 13:28:42 2005 Subject: [Biojava-l] application to manipulate sequence and annotation Message-ID: <20050707173737.44488.qmail@web25704.mail.ukl.yahoo.com> hi all biojavabien, My question is simple : did somebody developped a biojava application to manipulate sequence , create annotation , use colors for specific sequence...? thanks Greg \\|// (o o) -. .-. .-oOOo~(_)~oOOo-. .-. .-. ||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ |/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X ' `-' `-' `-' `-' `-' `-' `-' VOISIN greg. Bioinformaticien. Centre de recherche du CHUM. MONTREAL --------------------------------- Appel audio GRATUIT partout dans le monde avec le nouveau Yahoo! Messenger T?l?chargez le ici ! From mark.schreiber at novartis.com Thu Jul 7 20:55:58 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Thu Jul 7 20:46:59 2005 Subject: [Biojava-l] application to manipulate sequence and annotation Message-ID: There are but you would need to be a bit more specific about your needs. Alternatively you could search google scholar with the word biojava. That usually turns up some interesting results. - Mark gregory voisin Sent by: biojava-l-bounces@portal.open-bio.org 07/08/2005 01:37 AM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] application to manipulate sequence and annotation hi all biojavabien, My question is simple : did somebody developped a biojava application to manipulate sequence , create annotation , use colors for specific sequence...? thanks Greg \\|// (o o) -. .-. .-oOOo~(_)~oOOo-. .-. .-. ||X|||\ /|||X|||\ /|||X|||\ /|||X|||\ |/ \|||X|||/ \|||X|||/ \|||X|||/ \|||X ' `-' `-' `-' `-' `-' `-' `-' VOISIN greg. Bioinformaticien. Centre de recherche du CHUM. MONTREAL --------------------------------- Appel audio GRATUIT partout dans le monde avec le nouveau Yahoo! Messenger T?l?chargez le ici ! _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From tcw102 at york.ac.uk Fri Jul 8 10:28:40 2005 From: tcw102 at york.ac.uk (Williamson, TC) Date: Fri Jul 8 10:19:44 2005 Subject: [Biojava-l] FASTA Parser problems Message-ID: <42CE8D98.4000408@york.ac.uk> Hello all. I'm having trouble with the FASTA Parser presented in BioJava in Anger. I just don't seem to be able to get any results from my file. Here is the script: import java.io.*; import java.util.*; import org.biojava.bio.program.sax.*; import org.biojava.bio.program.ssbind.*; import org.biojava.bio.search.*; import org.biojava.bio.seq.db.*; import org.xml.sax.*; import org.biojava.bio.*; public class BlastParser { /** * String location is the full path of a FASTA output file */ public static void main(String[] args) { try { //get the Blast input as a Stream String location = "Full//path//of//file"; InputStream is = new FileInputStream(location); //make a FastaSearchSAXParser FastaSearchSAXParser parser = new FastaSearchSAXParser(); //make the SAX event adapter that will pass events to a Handler. SeqSimilarityAdapter adapter = new SeqSimilarityAdapter(); //set the parsers SAX event adapter parser.setContentHandler(adapter); //The list to hold the SeqSimilaritySearchResults List results = new ArrayList(); //create the SearchContentHandler that will build SeqSimilaritySearchResults //in the results List SearchContentHandler builder = new BlastLikeSearchBuilder(results, new DummySequenceDB("queries"), new DummySequenceDBInstallation()); //register builder with adapter adapter.setSearchContentHandler(builder); //parse the file, after this the result List will be populated with //SeqSimilaritySearchResults parser.parse(new InputSource(is)); //output some blast details for (Iterator i = results.iterator(); i.hasNext(); ) { SeqSimilaritySearchResult result = (SeqSimilaritySearchResult)i.next(); Annotation anno = result.getAnnotation(); for (Iterator j = anno.keys().iterator(); j.hasNext(); ) { Object key = j.next(); Object property = anno.getProperty(key); System.out.println(key+" : "+property); } System.out.println("Hits: "); //list the hits for (Iterator k = result.getHits().iterator(); k.hasNext(); ) { SeqSimilaritySearchHit hit = (SeqSimilaritySearchHit)k.next(); System.out.print("\tmatch: "+hit.getSubjectID()); System.out.println("\te score: "+hit.getEValue()); } System.out.println("\n"); } } catch (SAXException ex) { //XML problem ex.printStackTrace(); }catch (IOException ex) { //IO problem, possibly file not found ex.printStackTrace(); } } } From hollandr at gis.a-star.edu.sg Sun Jul 10 21:10:39 2005 From: hollandr at gis.a-star.edu.sg (Richard HOLLAND) Date: Sun Jul 10 21:03:26 2005 Subject: [Biojava-l] FASTA Parser problems Message-ID: <6D9E9B9DF347EF4385F6271C64FB8D5601DCB3D4@BIONIC.biopolis.one-north.com> Could you send me the input file you are using as an attachment (directly, not via the mailing list as it will get removed), I can then run the script and compare input to output. Also, which BioJava version are you using? cheers, Richard Richard Holland Bioinformatics Specialist GIS extension 8199 --------------------------------------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its content to any other person. Thank you. --------------------------------------------- > -----Original Message----- > From: biojava-l-bounces@portal.open-bio.org > [mailto:biojava-l-bounces@portal.open-bio.org] On Behalf Of > Williamson, TC > Sent: Friday, July 08, 2005 10:29 PM > To: biojava-l@biojava.org > Subject: [Biojava-l] FASTA Parser problems > > > Hello all. > > I'm having trouble with the FASTA Parser presented in BioJava > in Anger. > > I just don't seem to be able to get any results from my file. > Here is > the script: > > import java.io.*; > import java.util.*; > > > import org.biojava.bio.program.sax.*; > import org.biojava.bio.program.ssbind.*; > import org.biojava.bio.search.*; > import org.biojava.bio.seq.db.*; > import org.xml.sax.*; > import org.biojava.bio.*; > > public class BlastParser { > /** > * String location is the full path of a FASTA output file > */ > public static void main(String[] args) { > try { > //get the Blast input as a Stream > String location = "Full//path//of//file"; > InputStream is = new FileInputStream(location); > > //make a FastaSearchSAXParser > FastaSearchSAXParser parser = new FastaSearchSAXParser(); > > //make the SAX event adapter that will pass events to > a Handler. > SeqSimilarityAdapter adapter = new SeqSimilarityAdapter(); > > //set the parsers SAX event adapter > parser.setContentHandler(adapter); > > //The list to hold the SeqSimilaritySearchResults > List results = new ArrayList(); > > //create the SearchContentHandler that will build > SeqSimilaritySearchResults > //in the results List > SearchContentHandler builder = new > BlastLikeSearchBuilder(results, > new DummySequenceDB("queries"), new > DummySequenceDBInstallation()); > > //register builder with adapter > adapter.setSearchContentHandler(builder); > > //parse the file, after this the result List will be > populated with > //SeqSimilaritySearchResults > parser.parse(new InputSource(is)); > > //output some blast details > for (Iterator i = results.iterator(); i.hasNext(); ) { > SeqSimilaritySearchResult result = > (SeqSimilaritySearchResult)i.next(); > > Annotation anno = result.getAnnotation(); > > for (Iterator j = anno.keys().iterator(); j.hasNext(); ) { > Object key = j.next(); > Object property = anno.getProperty(key); > System.out.println(key+" : "+property); > } > System.out.println("Hits: "); > > //list the hits > for (Iterator k = result.getHits().iterator(); > k.hasNext(); ) { > SeqSimilaritySearchHit hit = > (SeqSimilaritySearchHit)k.next(); > System.out.print("\tmatch: "+hit.getSubjectID()); > System.out.println("\te score: "+hit.getEValue()); > } > > System.out.println("\n"); > } > > } > catch (SAXException ex) { > //XML problem > ex.printStackTrace(); > }catch (IOException ex) { > //IO problem, possibly file not found > ex.printStackTrace(); > } > } > } > _______________________________________________ > Biojava-l mailing list - Biojava-l@biojava.org > http://biojava.org/mailman/listinfo/biojava-l > From mark.schreiber at novartis.com Mon Jul 11 03:13:03 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Mon Jul 11 03:05:25 2005 Subject: [Biojava-l] pairwise alignments Message-ID: Hello - I have added another tutorial to the Biojava in anger pages. This one explains how to generate a pair-wise alignment between two sequences. The solution also demonstrates a lot of how HMMs work in biojava. You can find it under the Weight Matrix and Dynamic Programming section. http://www.biojava.org/docs/bj_in_anger/ http://www.biojava.org/docs/bj_in_anger/PairAlign.htm Enjoy! Mark Schreiber Principal Scientist (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 From mark.schreiber at novartis.com Mon Jul 11 05:35:45 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Mon Jul 11 05:26:56 2005 Subject: [Biojava-l] Announce: Biojava1.4 released Message-ID: BioJava 1.4 has been officially released. This represents a major new step in biojava's development. It has been about two years in the making and offers considerably more functionality and stability over the previous official release (biojava 1.3). We highly recommend you upgrade as soon as possible. Thanks to the entire biojava community for making this possible! Mark Schreiber Principal Scientist (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 From mark.schreiber at novartis.com Wed Jul 13 02:34:22 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Wed Jul 13 02:25:26 2005 Subject: [Biojava-l] New Biojava in Anger examples Message-ID: Hello = I have recently added a new example to biojava in anger (www.biojava.org/docs/bj_in_anger/) that explains a little about how the biojava sequence I/O system works and how you can customize it. If you find the methods of SeqIOTools a little restrictive then this will be of interest to you. http://www.biojava.org/docs/bj_in_anger/seqioecho.html I have also completely rewritten the Fasta parser example. http://www.biojava.org/docs/bj_in_anger/FastaParser.htm Enjoy! - Mark Mark Schreiber Principal Scientist (Bioinformatics) Novartis Institute for Tropical Diseases (NITD) 10 Biopolis Road #05-01 Chromos Singapore 138670 www.nitd.novartis.com phone +65 6722 2973 fax +65 6722 2910 From osanchez at fis.upv.es Thu Jul 14 10:37:18 2005 From: osanchez at fis.upv.es (=?ISO-8859-1?Q?=22=D3scar_D=2E_S=E1nchez_Jim=E9nez=22?=) Date: Thu Jul 14 10:28:07 2005 Subject: [Biojava-l] Trouble with GOParser Message-ID: <42D6789E.5030608@fis.upv.es> Hello, I am having some troubles wiht GOParser class. I would like to use it in order to load an ontology and then display this ontology in a JTree. This is the code I am testing: try { BufferedReader file = new BufferedReader(new FileReader("po_anatomy.ontology")); OntologyFactory factory = OntoTools.getDefaultFactory(); GOParser parser = new GOParser(); Ontology onto = parser.parseGO(file, "trait", "description", factory); } catch (Exception e) {System.out.println(e);}; When I execute it, I get a java.lang.NullPointerException. The line causing the exception is the parseGO(...) one. Do you have any idea? I am able to read perfectly the ontology with Dag-Edit. Thanks in advance, ?scar D. S?nchez -- ?scar David S?nchez Jim?nez Telecommunications Engineer - Computer Science Ph.D. student Grupo de Inform?tica M?dica ITACA - BET (Bioingenier?a, Electr?nica y Telemedicina) Universidad Polit?cnica de Valencia Camino de Vera s/n E-46022 Valencia (Spain) From mark.schreiber at novartis.com Thu Jul 14 21:44:08 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Thu Jul 14 21:35:04 2005 Subject: [Biojava-l] Trouble with GOParser Message-ID: Hi - Can you post the entire stack trace and the version of biojava you are using to the list (not direct to me as I'm not an expert on this API)? - Mark "?scar D. S?nchez Jim?nez" Sent by: biojava-l-bounces@portal.open-bio.org 07/14/2005 10:37 PM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Trouble with GOParser Hello, I am having some troubles wiht GOParser class. I would like to use it in order to load an ontology and then display this ontology in a JTree. This is the code I am testing: try { BufferedReader file = new BufferedReader(new FileReader("po_anatomy.ontology")); OntologyFactory factory = OntoTools.getDefaultFactory(); GOParser parser = new GOParser(); Ontology onto = parser.parseGO(file, "trait", "description", factory); } catch (Exception e) {System.out.println(e);}; When I execute it, I get a java.lang.NullPointerException. The line causing the exception is the parseGO(...) one. Do you have any idea? I am able to read perfectly the ontology with Dag-Edit. Thanks in advance, ?scar D. S?nchez -- ?scar David S?nchez Jim?nez Telecommunications Engineer - Computer Science Ph.D. student Grupo de Inform?tica M?dica ITACA - BET (Bioingenier?a, Electr?nica y Telemedicina) Universidad Polit?cnica de Valencia Camino de Vera s/n E-46022 Valencia (Spain) _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From S2100086 at student.rmit.edu.au Wed Jul 20 06:38:50 2005 From: S2100086 at student.rmit.edu.au (Daniel Park) Date: Wed Jul 20 06:39:16 2005 Subject: [Biojava-l] (no subject) Message-ID: <1121855930.9783b3bcS2100086@student.rmit.edu.au> Hi all, I'm doing a report about some bio-informatics tools and their uses, one of the tools I'm considering reviewing is bio-java. I was wondering if anybody could help me by answering some of the following questions about bio-java. - are there any research papers that I could get access to where bio-java was used for analysis? and where could I find them? - what environments are people using it in? eg commercial, academic - why do people use Bio-Java as opposed to other applictions. I would also be happy to hear personal stories/opinions from people using bio-java. or anyother bio-java related information would be helpful. Any assistance would be much appreciated. Cheers, Daniel Park From mark.schreiber at novartis.com Wed Jul 20 22:48:24 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Wed Jul 20 22:38:55 2005 Subject: [Biojava-l] (no subject) Message-ID: >Hi all, > >I'm doing a report about some bio-informatics tools and their uses, one of the tools I'm considering reviewing is >bio-java. > >I was wondering if anybody could help me by answering some of the following questions about bio-java. > >- are there any research papers that I could get access to where bio-java was used for analysis? and where could I >find them? The best way to find this out is to search Google Scholar with the term biojava. The last time I looked there was a lot. >- what environments are people using it in? eg commercial, academic Hard to know exactly. According to www.biojava.org/usage/ about 50% of traffic to the site is from .com addresses. It's hard to interpret web traffic as usage and a reasonable amount of the 50% will be from web-crawlers indexing the site. It is definitely used in both commercial and academic settings. >- why do people use Bio-Java as opposed to other applictions. BioJava is not really an application. It is a Java programming library for bioinformatics. I guess they use it cause it's free and it (mostly) works. >I would also be happy to hear personal stories/opinions from people using bio-java. >or anyother bio-java related information would be helpful. >Any assistance would be much appreciated. > >Cheers, >Daniel Park You might want to put a survey form somewhere on the web. We have never done a survey of biojava users and the results might be interesting and help the direction of the project. - Mark From tblum at andrew.cmu.edu Sat Jul 23 22:14:38 2005 From: tblum at andrew.cmu.edu (Tal Blum) Date: Sat Jul 23 22:14:53 2005 Subject: [Biojava-l] Protein CharacterTokenization Message-ID: <200507240214.j6O2EeRP020937@smtp.andrew.cmu.edu> Hi, There is something strange with the protein alphabet CharacterTokenization. It knows how to parse the ambiguity symbol 'X', but it does not contain the other way around mapping of the protein alphabet ambiguity symbol to 'X'. Is that the way it should be or is that a bug? Can someone suggest a way I can correct it? The Alphabet is wrapped in a WellKnownTokenizationWrapper in AlphabetManager, so I can't simply add a Symbol to it. Thanks, tal From mark.schreiber at novartis.com Sun Jul 24 21:42:32 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Sun Jul 24 21:33:05 2005 Subject: [Biojava-l] Protein CharacterTokenization Message-ID: Hello - Can you provide some example code? Any protein ambiguity should map to X. Unlike DNA which has lots of ambiguity codes with different meanings. BioJava can support all kinds of protein ambiguity but when they are tokenized they should all end up as X. - Mark "Tal Blum" Sent by: biojava-l-bounces@portal.open-bio.org 07/24/2005 10:14 AM To: cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Protein CharacterTokenization Hi, There is something strange with the protein alphabet CharacterTokenization. It knows how to parse the ambiguity symbol 'X', but it does not contain the other way around mapping of the protein alphabet ambiguity symbol to 'X'. Is that the way it should be or is that a bug? Can someone suggest a way I can correct it? The Alphabet is wrapped in a WellKnownTokenizationWrapper in AlphabetManager, so I can't simply add a Symbol to it. Thanks, tal _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From rohdester at gmail.com Sun Jul 31 08:36:36 2005 From: rohdester at gmail.com (Jacob Rohde) Date: Sun Jul 31 08:27:58 2005 Subject: [Biojava-l] Cast to Sequence Message-ID: Hi, I'm having a problem with a JTree and rendering of Sequences. Since the JTree calls toString() on objects when displaying them, I had to make my own TreeCellRenderer because the Sequence toString method prints out debug info. My code looks like this: public class FeatureTreeCellRenderer extends DefaultTreeCellRenderer { public Component getTreeCellRendererComponent(JTree tree, Object value, boolean sel, boolean expanded, boolean leaf, int row, boolean hasFocus) { JLabel l = (JLabel) super.getTreeCellRendererComponent(tree, value, sel, expanded, leaf, row, hasFocus); if(leaf) { System.out.println(value); Sequence s = (Sequence) value; l.setText(s.getName()); } return l; } } My problem is that I always get a ClassCastException. The weird thing is that the println method call above the cast clearly shows that the cast should be possible. This is the output of the pint statement: org.biojava.bio.seq.impl.SimpleSequence@2ba11b name: DNA seq 0 And this is the exception: Exception in thread "AWT-EventQueue-0" java.lang.ClassCastException: javax.swing.tree.DefaultMutableTreeNode . etc. etc. I'm so confused. Any ideas? Thanks in advance, Jacob Rohde From tmo at ebi.ac.uk Sun Jul 31 09:22:15 2005 From: tmo at ebi.ac.uk (Tom Oinn) Date: Sun Jul 31 09:11:01 2005 Subject: [Biojava-l] Cast to Sequence In-Reply-To: References: Message-ID: <42ECD087.5010900@ebi.ac.uk> Jacob Rohde wrote: > Hi, > > I'm having a problem with a JTree and rendering of Sequences. > > Since the JTree calls toString() on objects when displaying them, I > had to make my own TreeCellRenderer because the Sequence toString > method prints out debug info. > > My code looks like this: > > public class FeatureTreeCellRenderer extends DefaultTreeCellRenderer > { > public Component getTreeCellRendererComponent(JTree tree, Object value, > boolean sel, boolean expanded, boolean leaf, int row, boolean hasFocus) > { > JLabel l = (JLabel) super.getTreeCellRendererComponent(tree, > value, sel, expanded, leaf, row, hasFocus); > > if(leaf) > { > System.out.println(value); > Sequence s = (Sequence) value; > l.setText(s.getName()); > } > return l; > } > } > > My problem is that I always get a ClassCastException. The weird thing > is that the println method call above the cast clearly shows that the > cast should be possible. In no way whatsoever does it show that :) The 'value' supplied to the renderer is the implementation of the TreeNode interface used in the TreeModel you're rendering over. In this case, as you are presumably subclassing DefaultTreeModel and DefaultMutableTreeNode you're getting value set to an instance of DefaultMutableTreeNode which of course can't be cast to a Sequence. The confusion comes because the toString method of DefaultMutableTreeNode is something like 'return userObject.toString()' and so produces exactly the same result in your print statement as you'd get if it were the user object (in this case your Sequence). You should use : Sequence s = (Sequence)((DefaultMutableTreeNode)getUserObject()); instead, and next time read the javadoc for TreeCellRenderer and the like more carefully :) Tom From tmo at ebi.ac.uk Sun Jul 31 09:23:48 2005 From: tmo at ebi.ac.uk (Tom Oinn) Date: Sun Jul 31 09:12:01 2005 Subject: [Biojava-l] Cast to Sequence In-Reply-To: <42ECD087.5010900@ebi.ac.uk> References: <42ECD087.5010900@ebi.ac.uk> Message-ID: <42ECD0E4.9050409@ebi.ac.uk> Tom Oinn wrote: > You should use : Sequence s = > (Sequence)((DefaultMutableTreeNode)getUserObject()); instead, and next > time read the javadoc for TreeCellRenderer and the like more carefully :) Or even : Sequence s = (Sequence)((DefaultMutableTreeNode)value).getUserObject(); in the case of wanting code that actually compiles! Tom From mark.schreiber at novartis.com Sun Jul 31 21:41:05 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Sun Jul 31 21:31:44 2005 Subject: [Biojava-l] Cast to Sequence Message-ID: You might also want to use the Biojava FeatureTree class. An example of it's use is at http://www.biojava.org/docs/bj_in_anger/treeView.htm - Mark Jacob Rohde Sent by: biojava-l-bounces@portal.open-bio.org 07/31/2005 08:36 PM Please respond to Jacob Rohde To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Cast to Sequence Hi, I'm having a problem with a JTree and rendering of Sequences. Since the JTree calls toString() on objects when displaying them, I had to make my own TreeCellRenderer because the Sequence toString method prints out debug info. My code looks like this: public class FeatureTreeCellRenderer extends DefaultTreeCellRenderer { public Component getTreeCellRendererComponent(JTree tree, Object value, boolean sel, boolean expanded, boolean leaf, int row, boolean hasFocus) { JLabel l = (JLabel) super.getTreeCellRendererComponent(tree, value, sel, expanded, leaf, row, hasFocus); if(leaf) { System.out.println(value); Sequence s = (Sequence) value; l.setText(s.getName()); } return l; } } My problem is that I always get a ClassCastException. The weird thing is that the println method call above the cast clearly shows that the cast should be possible. This is the output of the pint statement: org.biojava.bio.seq.impl.SimpleSequence@2ba11b name: DNA seq 0 And this is the exception: Exception in thread "AWT-EventQueue-0" java.lang.ClassCastException: javax.swing.tree.DefaultMutableTreeNode . etc. etc. I'm so confused. Any ideas? Thanks in advance, Jacob Rohde _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l From mark.schreiber at novartis.com Sun Jul 31 21:41:05 2005 From: mark.schreiber at novartis.com (mark.schreiber@novartis.com) Date: Sun Jul 31 21:33:56 2005 Subject: [Biojava-l] Cast to Sequence Message-ID: You might also want to use the Biojava FeatureTree class. An example of it's use is at http://www.biojava.org/docs/bj_in_anger/treeView.htm - Mark Jacob Rohde Sent by: biojava-l-bounces@portal.open-bio.org 07/31/2005 08:36 PM Please respond to Jacob Rohde To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Cast to Sequence Hi, I'm having a problem with a JTree and rendering of Sequences. Since the JTree calls toString() on objects when displaying them, I had to make my own TreeCellRenderer because the Sequence toString method prints out debug info. My code looks like this: public class FeatureTreeCellRenderer extends DefaultTreeCellRenderer { public Component getTreeCellRendererComponent(JTree tree, Object value, boolean sel, boolean expanded, boolean leaf, int row, boolean hasFocus) { JLabel l = (JLabel) super.getTreeCellRendererComponent(tree, value, sel, expanded, leaf, row, hasFocus); if(leaf) { System.out.println(value); Sequence s = (Sequence) value; l.setText(s.getName()); } return l; } } My problem is that I always get a ClassCastException. The weird thing is that the println method call above the cast clearly shows that the cast should be possible. This is the output of the pint statement: org.biojava.bio.seq.impl.SimpleSequence@2ba11b name: DNA seq 0 And this is the exception: Exception in thread "AWT-EventQueue-0" java.lang.ClassCastException: javax.swing.tree.DefaultMutableTreeNode . etc. etc. I'm so confused. Any ideas? Thanks in advance, Jacob Rohde _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l