[Biojava-l] Biojava-l Digest, Vol 131, Issue 3

Andreas Prlic andreas at sdsc.edu
Fri Jan 17 18:24:03 UTC 2014


I don;t know how big your sequence database is, but it should be possible
to load your sequences into memory and avoid most of the IO overhead and
just focus on compute effectiveness.

Performing a one sequence against a DB search can be nicely parallelized
and if you have a bunch of CPUs or even a few computers, there are some
nice libraries that allow you to parallelize things massively. (or just
write something multi-threaded if you have only one computer with multiple
CPUs)

Andreas






On Fri, Jan 17, 2014 at 10:15 AM, Peter S <peters337 at yahoo.co.uk> wrote:

> Thanks Andreas,
>
> I am switching from python/perl so my java is not great but with the
> implementation you mention I would need to pass the sequence each time and
> run it one by one? SSEARCH is also 'slow' (SW) but has a lot of
> optimization in place so at the end it does not take that long to run it.
> It's in C++ though.
>
> Peter
>
>
>   On Friday, 17 January 2014, 18:09, Andreas Prlic <andreas at sdsc.edu>
> wrote:
>  We do have a Smith Waterman implementation in Biojava. However the
> algorithm is based on dynamic programming, which by definition is "slow"
> but gives you the optimal alignment...
>
> http://biojava.org/wiki/BioJava:CookBook3:PSA#Local_alignment
>
> Andreas
>
>
>
>
> On Fri, Jan 17, 2014 at 9:50 AM, Peter S <peters337 at yahoo.co.uk> wrote:
>
> Thanks, I will give it a try.
>
> Does it mean there is no fast implementation of SW in java that I can use?
>
> Best,
> Peter
>
>
>
> On Friday, 17 January 2014, 17:45, Khalil El Mazouari <
> khalil.elmazouari at gmail.com> wrote:
>
> Hi Peter,
>
> give it a try with Levenshtein Distance. You can use StringUtils from
> apache common lang. it has a getLevenshteinDistance method.
>
> best,
>
> Khalil
>
>
>
> On 17 Jan 2014, at 18:37, Peter S <peters337 at yahoo.co.uk> wrote:
>
> Hi Khalil,
> >
> >
> >By short sequence I mean 12-18 nt long. I need to make alignment against
> the entire transcriptome and detect matches with up to 3 mismatches. This
> is the reason I need something quite fast but sensitive at the same time.
> >
> >
> >Many thanks,
> >Peter
> >
> >
> >
> >On Friday, 17 January 2014, 17:26, Khalil El Mazouari <
> khalil.elmazouari at gmail.com> wrote:
> >
> >Hi,
> >
> >what do you mean by short sequences? NT or AA?
> >
> >Best
> >
> >Khalil
> >
> >On 17 Jan 2014, at 18:00, biojava-l-request at lists.open-bio.org wrote:
> >
> >> Send Biojava-l mailing list submissions to
> >>     biojava-l at lists.open-bio.org
> >>
> >> To subscribe or unsubscribe via the World Wide Web, visit
> >>     http://lists.open-bio.org/mailman/listinfo/biojava-l
> >> or, via email, send a message with subject or body 'help' to
> >>     biojava-l-request at lists.open-bio.org
> >>
> >> You can reach the person managing the list at
> >>     biojava-l-owner at lists.open-bio.org
> >>
> >> When replying, please edit your Subject line so it is more specific
> >> than "Re: Contents of Biojava-l digest..."
> >>
> >>
> >> Today's Topics:
> >>
> >>   1. Database search with Smith and Waterman (Peter S)
> >>
> >>
> >> ----------------------------------------------------------------------
> >>
> >> Message: 1
> >> Date: Fri, 17 Jan 2014 13:27:17 +0000 (GMT)
> >> From: Peter S <peters337 at yahoo.co.uk>
> >> Subject: [Biojava-l] Database search with Smith and Waterman
> >>
>  To: "biojava-l at lists.open-bio.org" <biojava-l at lists.open-bio.org>
> >> Message-ID:
> >>     <1389965237.13315.YahooMailNeo at web172703.mail.ir2.yahoo.com>
> >> Content-Type: text/plain; charset=iso-8859-1
> >>
> >> Dear All,?
> >>
> >> I'm looking for an implementation of Smith and Waterman algorithm to
> use in the Java desktop application I want to develop.?
> >>
> >> I did find some information on pairwise aligners but what I would
> ideally want to have is something similar to the SSEARCH package that can
> perform alignments against a very big databases,
>  saved locally in a fasta format. Speed is quite important and ideally I
> would need an output that I can easily parse, identifying mismatch/gap
> positions etc.
> >>
> >> Any suggestions if there is any java implementation that would fit the
> description? I will be working on short sequences so sensitivity is
> crucial.?
> >>
> >> Thanks very much for your help,
> >> Peter
> >>
> >>
> >> ------------------------------
> >>
> >> _______________________________________________
> >> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/biojava-l
> >>
> >>
> >> End of Biojava-l Digest, Vol 131, Issue 3
> >>
>  *****************************************
> >
> >
> >
> >
> >
> >-----
> >
> >Confidentiality Notice: This e-mail and any files transmitted with it are
> private and confidential and are solely for the use of the addressee. It
> may contain material which is legally privileged. If you are not the
> addressee or the person responsible for delivering to the addressee, please
> notify that you have received this e-mail in error and that any use of it
> is strictly prohibited. It would be helpful if you could notify the author
> by replying to it.
> >
> >
> >
> >
> >
>
>
>
>
>
>
> -----
>
> Confidentiality Notice: This e-mail and any files transmitted with it are
> private and confidential and are solely for the use of the addressee. It
> may contain material which is legally privileged. If you are not the
> addressee or the person responsible for delivering to the addressee, please
> notify that you have received this e-mail in error and that any use of it
> is strictly prohibited. It would be helpful if you could notify the author
> by replying to it.
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
>
>
>
>
>
>


-- 
-----------------------------------------------------------------------
Dr. Andreas Prlic
Senior Scientist, RCSB PDB Protein Data Bank
University of California, San Diego
(+1) 858.246.0526
-----------------------------------------------------------------------



More information about the Biojava-l mailing list