[Biojava-dev] Fwd: Doing pattern matching on Proteins

Uday Kamath kamathuday at gmail.com
Fri Nov 19 18:48:19 UTC 2010


Andreas
Thanks for your reply. I looked at it, here are the problems i faced
1. Matcher matcher = p.matcher(seq.seqString());
Pattern doesn't have a method to do match on the sequence string but on the
Sequence, atleast in my verison of BioJava

2.Pattern p = Pattern.compile( MotifTools.createRegex(motif) );
doesnn't work so need to create a pattern factory with alphabets and compile
it to create a pattern. So i changed the MotifLister in example to have
     FiniteAlphabet alphabets = ProteinTools.getTAlphabet();
    Pattern p = PatternFactory.makeFactory(alphabets).compile(target);

3. When i use the same code with this modification that is to do matcher()
on sequences, it goes into recursion and throws out of memory exception.

I have attached my minor modification and my input. I don't know what i am
doing is wrong.
JVM args were
protein C:\Research\HIV-Protease\SampleProtein.fasta an 3

Thanks for your reply, would really appreciate if you give more inkling to
the problem
Uday Kamath


On Fri, Nov 19, 2010 at 12:49 PM, Andreas Prlic <andreas at sdsc.edu> wrote:

> Hi Uday,
>
> have you seen this cookbook page?
>
> http://www.biojava.org/wiki/BioJava:Cookbook:Sequence:Regex
>
>
> Andreas
>
> On Thu, Nov 18, 2010 at 7:49 PM, Uday Kamath <kamathuday at gmail.com> wrote:
> > Anyone ? anyhelp? a way to to motif search example or what am i doing
> wrong
> > below?
> > Thanks a ton!
> > Uday
> >
> > On Thu, Nov 18, 2010 at 9:54 AM, Uday Kamath <kamathuday at gmail.com>
> wrote:
> >
> >> Hello
> >> A simple question,
> >>
> >> In order to search a motif in Protein i used following code, is my
> method
> >> to create pattern factory right? Because matcher is going in infinite
> >> recurssion. Can someone suggest right usage? Thanks a ton
> >>
> >> //sample
> >> FiniteAlphabet alphabet = ProteinTools.getAlphabet();
> >> factory = PatternFactory.makeFactory(alphabet);
> >> SymbolList proteinSequence = ProteinTools.createProtein("CANLSTFA");
> >> //in the sequence find the match
> >> SymbolList motif = ProteinTools.createProtein("FA");
> >> Pattern p = HivProteaseProblem.factory.compile(
> >> MotifTools.createRegex(motif));
> >> Matcher occurences= p.matcher(proteinSequence);
> >>
> > _______________________________________________
> > biojava-dev mailing list
> > biojava-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/biojava-dev
> >
>
>
>
> --
> -----------------------------------------------------------------------
> Dr. Andreas Prlic
> Senior Scientist, RCSB PDB Protein Data Bank
> University of California, San Diego
> (+1) 858.246.0526
> -----------------------------------------------------------------------
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: MotifLister.java
Type: application/octet-stream
Size: 4530 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biojava-dev/attachments/20101119/592cec35/attachment-0004.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SampleProtein.fasta
Type: application/octet-stream
Size: 503 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biojava-dev/attachments/20101119/592cec35/attachment-0005.obj>


More information about the biojava-dev mailing list