[Biojava-dev] Reply:Re: protein_sequence_alignment

Richard Holland holland at ebi.ac.uk
Tue Dec 18 10:56:16 UTC 2007


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Your existing sequence alignment builder, "aligner", has a method which
returns an Alignment object over two Sequence objects:

Alignment alignment = aligner.getAlignment(query, subject);

You can then iterate over each position of this alignment and compute
the identity:

int matches = 0;
for (int i = 1; i <= alignment.length(); i++) {
  Symbol querySym = alignment.symbolAt(query.getName(), i);
  Symbol subjectSym = alignment.symbolAt(subject.getName(), i);
  if (querySym!=null && querySym.equals(subjectSym)) matches++;
}
double identity = (double)alignment.length() / (double)matches;

The code above will give you identity on a scale of 0.0 (no match) to
1.0 (exact match).

cheers,
Richard

simpleyrx wrote:
> Dear sir,
>  
>         Thank you for you letter. The program can work now. But I still
> have a question, how to calculation the identity of the alignment ?
>  
>                                          Student
>  
>  
>  
> 
> ÔÚ2007-12-18£¬"Richard Holland" <holland at ebi.ac.uk> дµÀ£º
> 
> The exception you are getting is caused by the following line:
> 
>  FiniteAlphabet alphabet = (FiniteAlphabet)
> AlphabetManager.alphabetForName("Protein");
> 
> You should replace the whole line with this call:
> 
>  FiniteAlphabet alphabet = ProteinTools.getAlphabet();
> 
> If however your proteins contain the stop codon (*) then you will need
> this line instead:
> 
>  FiniteAlphabet alphabet = ProteinTools.getTAlphabet();
> 
> Then the line will work and you will be able to continue testing the
> remainder of your code.
> 
> cheers,
> Richard
> 
> simpleyrx wrote:
> 
> 
>> Dear sir,
> 
> 
>> package edu.cau.strLab;
>> import java.io.File;
>> import org.biojava.bio.alignment.NeedlemanWunsch;
>> import org.biojava.bio.alignment.SequenceAlignment;
>> import org.biojava.bio.alignment.SmithWaterman;
>> import org.biojava.bio.alignment.SubstitutionMatrix;
>> import org.biojava.bio.seq.ProteinTools;
>> import org.biojava.bio.seq.Sequence;
>> import org.biojava.bio.symbol.AlphabetManager;
>> import org.biojava.bio.symbol.FiniteAlphabet;
> 
>> public class ProteinAlignment{
>>  public static void main(String[] args) {
>>   // TODO Auto-generated method stub
>>   try {
>>         // The alphabet of the sequences. For this example DNA is choosen.
>>         FiniteAlphabet alphabet = (FiniteAlphabet) AlphabetManager.alphabetForName("Protein");
>>         // Read the substitution matrix file. 
>>         // For this example the matrix NUC.4.4 is good.
>>         SubstitutionMatrix matrix = new SubstitutionMatrix(alphabet, new File("E:\\bioinformatics_package\\matrices\\BLOSUM62"));
>>         // Define the default costs for sequence manipulation for the global alignment.
>>         SequenceAlignment aligner = new NeedlemanWunsch( 
>>           0,  // match
>>           3, // replace
>>           2,      // insert
>>           2, // delete
>>           1,      // gapExtend
>>           matrix  // SubstitutionMatrix
>>         );
>> //        Sequence query  = DNATools.createDNASequence("AC", "query");
>> //        Sequence target = DNATools.createDNASequence("ACkG", "target");
>>       Sequence query =  ProteinTools.createProteinSequence("ACK","query");
>>      Sequence subject = ProteinTools.createProteinSequence("ACK", "subject");
> 
>>         // Perform an alignment and save the results.
>> //        aligner.pairwiseAlignment(
>> //          query, // first sequence
>> //          target // second one
>> //        );
>> //        aligner.pairwiseAlignment(query, subject);
> 
>>         // Print the alignment to the screen
> 
>>         System.out.println("Global alignment with Needleman-Wunsch:\n" + aligner.getAlignmentString());   
> 
>>         // Perform a local alginment from the sequences with Smith-Waterman. 
>>         // Firstly, define the expenses (penalties) for every single operation.
>> //        aligner = new SmithWaterman(
>> //          -1,     // match
>> //          3,      // replace 
>> //          2,      // insert
>> //          2,      // delete
>> //          1,      // gapExtend
>> //          matrix  // SubstitutionMatrix
>> //        );
>> //        // Perform the local alignment.
>> //        aligner.pairwiseAlignment(query, target);  
>> //     
>> //        System.out.println("\nlocal alignment with SmithWaterman:\n" + aligner.getAlignmentString());
>>       } catch (Exception exc) {
>>         exc.printStackTrace();
>>       }
>>  }
>> }
> 
> 
>> the result is below:
> 
> 
> 
>> java.util.NoSuchElementException: No alphabet for name Protein could be found
>>  at org.biojava.bio.symbol.AlphabetManager.alphabetForName(AlphabetManager.java:248)
>>  at edu.cau.strLab.ProteinAlignment.main(ProteinAlignment.java:20)
> 
> 
> 
>> could sb tell why and how to use NeedlemanWunsch to align protein sequences ?
> 
> 
> 
> 
>> ------------------------------------------------------------------------
> 
>> _______________________________________________
>> biojava-dev mailing list
>> biojava-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
> 
> --
> Richard Holland (BioMart)
> EMBL EBI, Wellcome Trust Genome Campus,
> Hinxton, Cambridgeshire CB10 1SD, UK
> Tel. +44 (0)1223 494416
> 
> http://www.biomart.org/
> http://www.biojava.org/

- --
Richard Holland (BioMart)
EMBL EBI, Wellcome Trust Genome Campus,
Hinxton, Cambridgeshire CB10 1SD, UK
Tel. +44 (0)1223 494416

http://www.biomart.org/
http://www.biojava.org/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHZ6dQ4C5LeMEKA/QRAqodAJ9wf9xxzJfgbXGH3YPxVg/ljxvskgCfVcQM
oGKGETxB0HBOM1NexHEuJMI=
=6zt9
-----END PGP SIGNATURE-----



More information about the biojava-dev mailing list