[Biojava-dev] Reply:Re: protein_sequence_alignment
Richard Holland
holland at ebi.ac.uk
Tue Dec 18 10:56:16 UTC 2007
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Your existing sequence alignment builder, "aligner", has a method which
returns an Alignment object over two Sequence objects:
Alignment alignment = aligner.getAlignment(query, subject);
You can then iterate over each position of this alignment and compute
the identity:
int matches = 0;
for (int i = 1; i <= alignment.length(); i++) {
Symbol querySym = alignment.symbolAt(query.getName(), i);
Symbol subjectSym = alignment.symbolAt(subject.getName(), i);
if (querySym!=null && querySym.equals(subjectSym)) matches++;
}
double identity = (double)alignment.length() / (double)matches;
The code above will give you identity on a scale of 0.0 (no match) to
1.0 (exact match).
cheers,
Richard
simpleyrx wrote:
> Dear sir,
>
> Thank you for you letter. The program can work now. But I still
> have a question, how to calculation the identity of the alignment ?
>
> Student
>
>
>
>
> ÔÚ2007-12-18£¬"Richard Holland" <holland at ebi.ac.uk> дµÀ£º
>
> The exception you are getting is caused by the following line:
>
> FiniteAlphabet alphabet = (FiniteAlphabet)
> AlphabetManager.alphabetForName("Protein");
>
> You should replace the whole line with this call:
>
> FiniteAlphabet alphabet = ProteinTools.getAlphabet();
>
> If however your proteins contain the stop codon (*) then you will need
> this line instead:
>
> FiniteAlphabet alphabet = ProteinTools.getTAlphabet();
>
> Then the line will work and you will be able to continue testing the
> remainder of your code.
>
> cheers,
> Richard
>
> simpleyrx wrote:
>
>
>> Dear sir,
>
>
>> package edu.cau.strLab;
>> import java.io.File;
>> import org.biojava.bio.alignment.NeedlemanWunsch;
>> import org.biojava.bio.alignment.SequenceAlignment;
>> import org.biojava.bio.alignment.SmithWaterman;
>> import org.biojava.bio.alignment.SubstitutionMatrix;
>> import org.biojava.bio.seq.ProteinTools;
>> import org.biojava.bio.seq.Sequence;
>> import org.biojava.bio.symbol.AlphabetManager;
>> import org.biojava.bio.symbol.FiniteAlphabet;
>
>> public class ProteinAlignment{
>> public static void main(String[] args) {
>> // TODO Auto-generated method stub
>> try {
>> // The alphabet of the sequences. For this example DNA is choosen.
>> FiniteAlphabet alphabet = (FiniteAlphabet) AlphabetManager.alphabetForName("Protein");
>> // Read the substitution matrix file.
>> // For this example the matrix NUC.4.4 is good.
>> SubstitutionMatrix matrix = new SubstitutionMatrix(alphabet, new File("E:\\bioinformatics_package\\matrices\\BLOSUM62"));
>> // Define the default costs for sequence manipulation for the global alignment.
>> SequenceAlignment aligner = new NeedlemanWunsch(
>> 0, // match
>> 3, // replace
>> 2, // insert
>> 2, // delete
>> 1, // gapExtend
>> matrix // SubstitutionMatrix
>> );
>> // Sequence query = DNATools.createDNASequence("AC", "query");
>> // Sequence target = DNATools.createDNASequence("ACkG", "target");
>> Sequence query = ProteinTools.createProteinSequence("ACK","query");
>> Sequence subject = ProteinTools.createProteinSequence("ACK", "subject");
>
>> // Perform an alignment and save the results.
>> // aligner.pairwiseAlignment(
>> // query, // first sequence
>> // target // second one
>> // );
>> // aligner.pairwiseAlignment(query, subject);
>
>> // Print the alignment to the screen
>
>> System.out.println("Global alignment with Needleman-Wunsch:\n" + aligner.getAlignmentString());
>
>> // Perform a local alginment from the sequences with Smith-Waterman.
>> // Firstly, define the expenses (penalties) for every single operation.
>> // aligner = new SmithWaterman(
>> // -1, // match
>> // 3, // replace
>> // 2, // insert
>> // 2, // delete
>> // 1, // gapExtend
>> // matrix // SubstitutionMatrix
>> // );
>> // // Perform the local alignment.
>> // aligner.pairwiseAlignment(query, target);
>> //
>> // System.out.println("\nlocal alignment with SmithWaterman:\n" + aligner.getAlignmentString());
>> } catch (Exception exc) {
>> exc.printStackTrace();
>> }
>> }
>> }
>
>
>> the result is below:
>
>
>
>> java.util.NoSuchElementException: No alphabet for name Protein could be found
>> at org.biojava.bio.symbol.AlphabetManager.alphabetForName(AlphabetManager.java:248)
>> at edu.cau.strLab.ProteinAlignment.main(ProteinAlignment.java:20)
>
>
>
>> could sb tell why and how to use NeedlemanWunsch to align protein sequences ?
>
>
>
>
>> ------------------------------------------------------------------------
>
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>
> --
> Richard Holland (BioMart)
> EMBL EBI, Wellcome Trust Genome Campus,
> Hinxton, Cambridgeshire CB10 1SD, UK
> Tel. +44 (0)1223 494416
>
> http://www.biomart.org/
> http://www.biojava.org/
- --
Richard Holland (BioMart)
EMBL EBI, Wellcome Trust Genome Campus,
Hinxton, Cambridgeshire CB10 1SD, UK
Tel. +44 (0)1223 494416
http://www.biomart.org/
http://www.biojava.org/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFHZ6dQ4C5LeMEKA/QRAqodAJ9wf9xxzJfgbXGH3YPxVg/ljxvskgCfVcQM
oGKGETxB0HBOM1NexHEuJMI=
=6zt9
-----END PGP SIGNATURE-----
More information about the biojava-dev
mailing list