[Biojava-l] Large RichSequence collection

Amr AL-HOSSARY amr_alhossary at hotmail.com
Thu Aug 1 05:58:30 UTC 2013


If your problem is in parsing/loading all the sequences in memory first,
before managing them, I had created a method public LinkedHashMap<String,S>
process(int max) in Class FastaReader in BioJava 3.0.6. It reads a maximum
(max) sequences to parse, then read next sequenes in a subsequent call.
You can use it. If you need a similar one in Biojava 1, I can make it for
you.

Otherwise, you will need to modify your algorithm to deal with smaller
clusters, based on the task you are doing.

Amr

-----Original Message-----
From: biojava-l-bounces at lists.open-bio.org
[mailto:biojava-l-bounces at lists.open-bio.org] On Behalf Of Khalil El
Mazouari
Sent: Thursday, August 01, 2013 1:17 AM
To: Biojava-l at lists.open-bio.org
Subject: [Biojava-l] Large RichSequence collection

Hi,

I have to process large dataset of DNA sequence(>= 120.000 seq). Sequences
are first annotated, clustered ... I end up with huge collection of
SimpleRichSequence objects consuming a lot of RAM.

Any suggestion on how to deal effectively with large collection of
RichSequence objects is welcome.

Thanks in advance.

khalil






-----

Confidentiality Notice: This e-mail and any files transmitted with it are
private and confidential and are solely for the use of the addressee. It may
contain material which is legally privileged. If you are not the addressee
or the person responsible for delivering to the addressee, please notify
that you have received this e-mail in error and that any use of it is
strictly prohibited. It would be helpful if you could notify the author by
replying to it.




_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l



More information about the Biojava-l mailing list