[Biojava-l] Biojava-l Digest, Vol 105, Issue 12

Wed Oct 19 18:36:28 UTC 2011

Hi Hannes,

just did a MSA test with 521 seq... and it works. It must be a memory issue.

try something like: java -Xmx1g -jar yourApp.jar args...

If you don't have enough RAM, try with 500m as suggested by Andreas,

Regards,

Khalil

On 19 Oct 2011, at 18:00, biojava-l-request at lists.open-bio.org wrote:

> Send Biojava-l mailing list submissions to
> 	biojava-l at lists.open-bio.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://lists.open-bio.org/mailman/listinfo/biojava-l
> or, via email, send a message with subject or body 'help' to
> 	biojava-l-request at lists.open-bio.org
> 
> You can reach the person managing the list at
> 	biojava-l-owner at lists.open-bio.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Biojava-l digest..."
> 
> 
> Today's Topics:
> 
>   1. Re: Multiple Sequence Alignment - Limits? (Andreas Prlic)
>   2. Re: Multiple Sequence Alignment - Limits?
>      (Hannes Brandst?tter-M?ller)
>   3. Re: Multiple Sequence Alignment - Limits?
>      (Hannes Brandst?tter-M?ller)
>   4. Re: Multiple Sequence Alignment - Limits? (Spencer Bliven)
>   5. Status of org.biojava3.data.sequence.SequenceUtil ?
>      (jvb at Cs.Nott.AC.UK)
>   6. Re: Status of org.biojava3.data.sequence.SequenceUtil ?
>      (Peter Troshin)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Tue, 18 Oct 2011 14:01:05 -0700
> From: Andreas Prlic <andreas at sdsc.edu>
> Subject: Re: [Biojava-l] Multiple Sequence Alignment - Limits?
> To: Hannes Brandst?tter-M?ller <biojava at hannes.oib.com>
> Cc: biojava-l <biojava-l at lists.open-bio.org>
> Message-ID:
> 	<CALthepz+7+KO1jo5gfiYHtJ-27jojGaHje7D55-B0sHuaZdqYw at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> Hi Hannes,
> 
> did you try to increase memory settings for your JVM?  e.g. -Xmx500M
> 
> Andreas
> 
> On Tue, Oct 18, 2011 at 2:46 AM, Hannes Brandst?tter-M?ller
> <biojava at hannes.oib.com> wrote:
>> On Tue, Oct 18, 2011 at 11:32, Hannes Brandst?tter-M?ller
>> <biojava at hannes.oib.com> wrote:
>>> Hi again!
>>> 
>>> I am quite happy with the Multiple Sequence Alignment, but I noticed
>>> that there seems to be a limit of 132 Sequences that are present in
>>> the final alignment - is this some kind of hardcoded limit, or can I
>>> work around that somehow?
>>> 
>>> Hannes
>>> 
>> 
>> Sorry, I counted that wrong. I had 132 lines, that is 66 sequences in
>> fasta format. Is there a way to work around that limit?
>> 
>> Hannes
>> 
>> _______________________________________________
>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>> 
> 
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Wed, 19 Oct 2011 06:36:19 +0200
> From: Hannes Brandst?tter-M?ller 	<biojava at hannes.oib.com>
> Subject: Re: [Biojava-l] Multiple Sequence Alignment - Limits?
> To: Andreas Prlic <andreas at sdsc.edu>
> Cc: biojava-l <biojava-l at lists.open-bio.org>
> Message-ID:
> 	<CAPXi2mkBsBJhzfHKtgquytXPV=hF0TZNYAM8cfCcen=QfmAj+A at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> Hi Andreas,
> 
> I will try that later today if that makes any difference; I ran a
> larger alignment batch overnight, and I noticed that this limit seems
> to have been a coincidence; HOWEVER, the aligned sequences are always
> not as many as the input sequences, is this caused by memory
> constraints or how can I influence that?
> 
> Hannes
> 
> On Tue, Oct 18, 2011 at 23:01, Andreas Prlic <andreas at sdsc.edu> wrote:
>> Hi Hannes,
>> 
>> did you try to increase memory settings for your JVM? ?e.g. -Xmx500M
>> 
>> Andreas
>> 
>> On Tue, Oct 18, 2011 at 2:46 AM, Hannes Brandst?tter-M?ller
>> <biojava at hannes.oib.com> wrote:
>>> On Tue, Oct 18, 2011 at 11:32, Hannes Brandst?tter-M?ller
>>> <biojava at hannes.oib.com> wrote:
>>>> Hi again!
>>>> 
>>>> I am quite happy with the Multiple Sequence Alignment, but I noticed
>>>> that there seems to be a limit of 132 Sequences that are present in
>>>> the final alignment - is this some kind of hardcoded limit, or can I
>>>> work around that somehow?
>>>> 
>>>> Hannes
>>>> 
>>> 
>>> Sorry, I counted that wrong. I had 132 lines, that is 66 sequences in
>>> fasta format. Is there a way to work around that limit?
>>> 
>>> Hannes
>>> 
>>> _______________________________________________
>>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>> 
>> 
> 
> 
> 
> ------------------------------
> 
> Message: 3
> Date: Wed, 19 Oct 2011 09:32:25 +0200
> From: Hannes Brandst?tter-M?ller 	<biojava at hannes.oib.com>
> Subject: Re: [Biojava-l] Multiple Sequence Alignment - Limits?
> To: Spencer Bliven <sbliven at ucsd.edu>
> Cc: biojava-l <biojava-l at lists.open-bio.org>
> Message-ID:
> 	<CAPXi2mnzkvLasRWixinxyh=w=V6bjePXRPhTYGF54ac5dFN57w at mail.gmail.com>
> Content-Type: text/plain; charset=windows-1252
> 
> I'm currently running another test, now with even more memory for java
> (500M) - it looks fine now so far. I'll re-check it later with the
> other files that gave me some problems, and will report back later
> today.
> 
> I had a "out of heap" exception when I tried it with the default
> memory settings, and with 256M it seems to have swallowed some
> sequences - I'll re-check and help you reproduce. It would be really
> bad if the code would swallow sequences without error messages when
> running out of memory, so I'll make sure I have proof :D
> 
> Hannes
> 
> On Wed, Oct 19, 2011 at 09:22, Spencer Bliven <sbliven at ucsd.edu> wrote:
>> Hannes?
>> 
>> There should not be a limit on the number of sequences, nor should you be
>> running into a memory problem. The FastaParser should be able to read
>> thousands of sequences, since it is used for genome FASTA files as well as
>> multiple alignments. My guess would be either a malformed FASTA file
>> (perhaps a problem with line endings?), or else a problem with the code to
>> generate the MultipleAlignment. Can you post some code snippets?
>> 
>> -Spencer
>> 
>> On Tue, Oct 18, 2011 at 21:36, Hannes Brandst?tter-M?ller
>> <biojava at hannes.oib.com> wrote:
>>> 
>>> Hi Andreas,
>>> 
>>> I will try that later today if that makes any difference; I ran a
>>> larger alignment batch overnight, and I noticed that this limit seems
>>> to have been a coincidence; HOWEVER, the aligned sequences are always
>>> not as many as the input sequences, is this caused by memory
>>> constraints or how can I influence that?
>>> 
>>> Hannes
>>> 
>>> On Tue, Oct 18, 2011 at 23:01, Andreas Prlic <andreas at sdsc.edu> wrote:
>>>> Hi Hannes,
>>>> 
>>>> did you try to increase memory settings for your JVM? ?e.g. -Xmx500M
>>>> 
>>>> Andreas
>>>> 
>>>> On Tue, Oct 18, 2011 at 2:46 AM, Hannes Brandst?tter-M?ller
>>>> <biojava at hannes.oib.com> wrote:
>>>>> On Tue, Oct 18, 2011 at 11:32, Hannes Brandst?tter-M?ller
>>>>> <biojava at hannes.oib.com> wrote:
>>>>>> Hi again!
>>>>>> 
>>>>>> I am quite happy with the Multiple Sequence Alignment, but I noticed
>>>>>> that there seems to be a limit of 132 Sequences that are present in
>>>>>> the final alignment - is this some kind of hardcoded limit, or can I
>>>>>> work around that somehow?
>>>>>> 
>>>>>> Hannes
>>>>>> 
>>>>> 
>>>>> Sorry, I counted that wrong. I had 132 lines, that is 66 sequences in
>>>>> fasta format. Is there a way to work around that limit?
>>>>> 
>>>>> Hannes
>>>>> 
>>>>> _______________________________________________
>>>>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>>> 
>>>> 
>>> 
>>> _______________________________________________
>>> Biojava-l mailing list ?- ?Biojava-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>> 
>> 
> 
> 
> 
> ------------------------------
> 
> Message: 4
> Date: Wed, 19 Oct 2011 00:22:55 -0700
> From: Spencer Bliven <sbliven at ucsd.edu>
> Subject: Re: [Biojava-l] Multiple Sequence Alignment - Limits?
> To: Hannes Brandst?tter-M?ller <biojava at hannes.oib.com>
> Cc: biojava-l <biojava-l at lists.open-bio.org>
> Message-ID:
> 	<CA+P6arns8F9XeZj4UK0hEmvV_uOG5nJxoZxJ5fBQ=nDh-xuXmg at mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
> 
> Hannes?
> 
> There should not be a limit on the number of sequences, nor should you be
> running into a memory problem. The FastaParser should be able to read
> thousands of sequences, since it is used for genome FASTA files as well as
> multiple alignments. My guess would be either a malformed FASTA file
> (perhaps a problem with line endings?), or else a problem with the code to
> generate the MultipleAlignment. Can you post some code snippets?
> 
> -Spencer
> 
> On Tue, Oct 18, 2011 at 21:36, Hannes Brandst?tter-M?ller <
> biojava at hannes.oib.com> wrote:
> 
>> Hi Andreas,
>> 
>> I will try that later today if that makes any difference; I ran a
>> larger alignment batch overnight, and I noticed that this limit seems
>> to have been a coincidence; HOWEVER, the aligned sequences are always
>> not as many as the input sequences, is this caused by memory
>> constraints or how can I influence that?
>> 
>> Hannes
>> 
>> On Tue, Oct 18, 2011 at 23:01, Andreas Prlic <andreas at sdsc.edu> wrote:
>>> Hi Hannes,
>>> 
>>> did you try to increase memory settings for your JVM?  e.g. -Xmx500M
>>> 
>>> Andreas
>>> 
>>> On Tue, Oct 18, 2011 at 2:46 AM, Hannes Brandst?tter-M?ller
>>> <biojava at hannes.oib.com> wrote:
>>>> On Tue, Oct 18, 2011 at 11:32, Hannes Brandst?tter-M?ller
>>>> <biojava at hannes.oib.com> wrote:
>>>>> Hi again!
>>>>> 
>>>>> I am quite happy with the Multiple Sequence Alignment, but I noticed
>>>>> that there seems to be a limit of 132 Sequences that are present in
>>>>> the final alignment - is this some kind of hardcoded limit, or can I
>>>>> work around that somehow?
>>>>> 
>>>>> Hannes
>>>>> 
>>>> 
>>>> Sorry, I counted that wrong. I had 132 lines, that is 66 sequences in
>>>> fasta format. Is there a way to work around that limit?
>>>> 
>>>> Hannes
>>>> 
>>>> _______________________________________________
>>>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>>>> 
>>> 
>> 
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
>> 
> 
> 
> 
> ------------------------------
> 
> Message: 5
> Date: 19 Oct 2011 12:15:26 +0100
> From: jvb at Cs.Nott.AC.UK
> Subject: [Biojava-l] Status of org.biojava3.data.sequence.SequenceUtil
> 	?
> To: biojava-l <biojava-l at lists.open-bio.org>
> Message-ID: <201110191215.aa17789 at pat.Cs.Nott.AC.UK>
> Content-Type: text/plain; format=flowed; charset=ISO-8859-1
> 
> Hello,
> 
> I can't find a jar containing org.biojava3.data.sequence.SequenceUtil, even 
> though it appears in the JavaDocs: 
> http://www.biojava.org/docs/api/org/biojava3/data/sequence/SequenceUtil.html
> 
> What is it's status? Can I get it, and should it rely on it if I can?
> 
> Thanks,
> 
> Jon
> 
> 
> 
> 
> ------------------------------
> 
> Message: 6
> Date: Wed, 19 Oct 2011 16:13:44 +0100
> From: Peter Troshin <p.v.troshin at dundee.ac.uk>
> Subject: Re: [Biojava-l] Status of
> 	org.biojava3.data.sequence.SequenceUtil ?
> To: jvb at cs.nott.ac.uk
> Cc: biojava-l at lists.open-bio.org
> Message-ID: <4E9EE928.4050506 at dundee.ac.uk>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
> 
> Hi Jon,
> 
> This class is a part of protein disorder prediction JAR and a recent 
> addition to BioJava. You are welcome to use if it suits your needs.
> Bear in mid though that the FASTA file reader from this class reads the 
> content of the whole FASTA file at once, i.e. if you are working with 
> large FASTA files you will want to use something else instead. I've got 
> a Stream based FASTA reader if you need one and if there is not one in 
> BioJava already.
> I would imagine the functionality from this class is not going to 
> disappear overnight, but it may and perhaps should be merged with other 
> FASTA parsers in BioJava once somebody have time to do this.
> 
> Regards,
> Peter
> 
> 
> On 19/10/2011 12:15, jvb at cs.nott.ac.uk wrote:
>> Hello,
>> 
>> I can't find a jar containing org.biojava3.data.sequence.SequenceUtil, 
>> even though it appears in the JavaDocs: 
>> http://www.biojava.org/docs/api/org/biojava3/data/sequence/SequenceUtil.html
>> 
>> What is it's status? Can I get it, and should it rely on it if I can?
>> 
>> Thanks,
>> 
>> Jon
>> 
>> 
>> _______________________________________________
>> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-l
> 
> 
> 
> ------------------------------
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
> 
> 
> End of Biojava-l Digest, Vol 105, Issue 12
> ******************************************