[Biojava-l] Alignment objects

Nathan S. Haigh n.haigh at sheffield.ac.uk
Wed Aug 9 15:13:49 UTC 2006


I think i'm having a few problem with alignments. I've generated an
protein alignment in the following way:

String alnString1 =
            ">seq1\n" +
            "----FGHIKLMNPQRST\n" +
            ">seq2\n" +
            "ACDEFGHIKLMNPQRST\n";
        BufferedReader br1 = new BufferedReader(new
StringReader(alnString1));
        FastaAlignmentFormat faf1 = new FastaAlignmentFormat();
        alignment = faf1.read( br1 );
       
If i loop over positions in the alignment to add the positions with gaps
to a Location object, i have to do the following. It seems hacky since
i'm having to check for symbol names containing "[]" in order to
identify gaps. I'm sure there must be a better way to do this!? A better
way would be to calculate the frequency of each symbol (including gaps)
at a position in the alignment. This way i could return a list of these
frequencies for each position which could be used by other methods for
identifying positions with certain characteristic (such as those
containing gaps) ...any ideas?

    for (int col = 1; col <= alignment.length(); col++) {
            for (Iterator labels = alignment.getLabels().iterator();
labels.hasNext(); ) {
                Object label = labels.next();
                Symbol sym = alignment.symbolAt(label,col);
               
                if (sym.getName().contains("[]")) {
                    Location newLocation =
LocationTools.makeLocation(col, col);
                    gapped = this.appendLocation(gapped, newLocation);
                }
            }
        }

Cheers
Nath



More information about the Biojava-l mailing list