[Biojava-l] Similarity measures for generalized sequences
Oliver Schmitt
schmitt at med.uni-rostock.de
Mon May 7 10:21:15 UTC 2012
Hi,
I'm looking for a general advice regarding the comparison of sequences
(S). I mean not necessarily DNA sequences, however,
sequences like Region A is connected with Regions B (shortly A->B) and
then a distance or similarity measure that
allows to identify similiar sequences or paths. The regions are
alphanumerically coded like "Bed nucleus of the stria terminalis
anterior division".
Given are 10^2 to 10^7 different paths, searched are all there mutual
similiarities (e.g., similarity matrix) and a multivariate
classificartion like a dendrogram
based on a meaningful cluster analysis.
Example
Given:
S1: A->B->C->G
S2: A->B->F->G
S3: A->C->B->G
S4: A->B->D->G
Searched:
Similiarity matrix
S1 S2 S3 S4
S1 ? ? ? ?
S2 ? ? ? ?
S3 ? ? ? ?
S4 ? ? ? ?
Then I would like to generate a dendrogram based on similarity measure:
S1--
|--
S2-- |
|----
S3-- |
|-- |
S4--
Thanks a lot for any advices.
Regards,
Oliver
-------------- next part --------------
A non-text attachment was scrubbed...
Name: schmitt.vcf
Type: text/x-vcard
Size: 310 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biojava-l/attachments/20120507/653e6053/attachment-0002.vcf>
More information about the Biojava-l
mailing list