[Biojava-dev] Doing pattern matching on Proteins

Andreas Prlic andreas at sdsc.edu
Wed Nov 24 02:05:35 UTC 2010


Did you try to give more memory to your JVM...?

A

On Tue, Nov 23, 2010 at 5:57 PM, Uday Kamath <kamathuday at gmail.com> wrote:
> same out of memory problem
> Uday
>
> On Tue, Nov 23, 2010 at 8:41 PM, Andreas Prlic <andreas at sdsc.edu> wrote:
>>
>> What about converting the sequence to a string and then use the
>> standard regular expressions?
>>
>> Andreas
>>
>> On Tue, Nov 23, 2010 at 4:42 AM, Uday Kamath <kamathuday at gmail.com> wrote:
>> > Anyone who can answer this query? I am stuck with this in my research
>> > work,
>> > so appreciate any response.
>> > Uday
>> >
>> > On Fri, Nov 19, 2010 at 1:48 PM, Uday Kamath <kamathuday at gmail.com>
>> > wrote:
>> >
>> >>
>> >>
>> >> Andreas
>> >> Thanks for your reply. I looked at it, here are the problems i faced
>> >> 1. Matcher matcher = p.matcher(seq.seqString());
>> >> Pattern doesn't have a method to do match on the sequence string but on
>> >> the
>> >> Sequence, atleast in my verison of BioJava
>> >>
>> >> 2.Pattern p = Pattern.compile( MotifTools.createRegex(motif) );
>> >> doesnn't work so need to create a pattern factory with alphabets and
>> >> compile it to create a pattern. So i changed the MotifLister in example
>> >> to
>> >> have
>> >>       FiniteAlphabet alphabets = ProteinTools.getTAlphabet();
>> >>     Pattern p = PatternFactory.makeFactory(alphabets).compile(target);
>> >>
>> >> 3. When i use the same code with this modification that is to do
>> >> matcher()
>> >> on sequences, it goes into recursion and throws out of memory
>> >> exception.
>> >>
>> >> I have attached my minor modification and my input. I don't know what i
>> >> am
>> >> doing is wrong.
>> >> JVM args were
>> >> protein C:\Research\HIV-Protease\SampleProtein.fasta an 3
>> >>
>> >> Thanks for your reply, would really appreciate if you give more inkling
>> >> to
>> >> the problem
>> >> Uday Kamath
>> >>
>> >>
>> >> On Fri, Nov 19, 2010 at 12:49 PM, Andreas Prlic <andreas at sdsc.edu>
>> >> wrote:
>> >>
>> >>> Hi Uday,
>> >>>
>> >>> have you seen this cookbook page?
>> >>>
>> >>> http://www.biojava.org/wiki/BioJava:Cookbook:Sequence:Regex
>> >>>
>> >>>
>> >>> Andreas
>> >>>
>> >>> On Thu, Nov 18, 2010 at 7:49 PM, Uday Kamath <kamathuday at gmail.com>
>> >>> wrote:
>> >>> > Anyone ? anyhelp? a way to to motif search example or what am i
>> >>> > doing
>> >>> wrong
>> >>> > below?
>> >>> > Thanks a ton!
>> >>> > Uday
>> >>> >
>> >>> > On Thu, Nov 18, 2010 at 9:54 AM, Uday Kamath <kamathuday at gmail.com>
>> >>> wrote:
>> >>> >
>> >>> >> Hello
>> >>> >> A simple question,
>> >>> >>
>> >>> >> In order to search a motif in Protein i used following code, is my
>> >>> method
>> >>> >> to create pattern factory right? Because matcher is going in
>> >>> >> infinite
>> >>> >> recurssion. Can someone suggest right usage? Thanks a ton
>> >>> >>
>> >>> >> //sample
>> >>> >> FiniteAlphabet alphabet = ProteinTools.getAlphabet();
>> >>> >> factory = PatternFactory.makeFactory(alphabet);
>> >>> >> SymbolList proteinSequence =
>> >>> >> ProteinTools.createProtein("CANLSTFA");
>> >>> >> //in the sequence find the match
>> >>> >> SymbolList motif = ProteinTools.createProtein("FA");
>> >>> >> Pattern p = HivProteaseProblem.factory.compile(
>> >>> >> MotifTools.createRegex(motif));
>> >>> >> Matcher occurences= p.matcher(proteinSequence);
>> >>> >>
>> >>> > _______________________________________________
>> >>> > biojava-dev mailing list
>> >>> > biojava-dev at lists.open-bio.org
>> >>> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
>> >>> >
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>>
>> >>> -----------------------------------------------------------------------
>> >>> Dr. Andreas Prlic
>> >>> Senior Scientist, RCSB PDB Protein Data Bank
>> >>> University of California, San Diego
>> >>> (+1) 858.246.0526
>> >>>
>> >>> -----------------------------------------------------------------------
>> >>>
>> >>
>> >>
>> >>
>> > _______________________________________________
>> > biojava-dev mailing list
>> > biojava-dev at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
>> >
>
>



-- 
-----------------------------------------------------------------------
Dr. Andreas Prlic
Senior Scientist, RCSB PDB Protein Data Bank
University of California, San Diego
(+1) 858.246.0526
-----------------------------------------------------------------------




More information about the biojava-dev mailing list