[Biojava-l] Alignment objects

mark.schreiber at novartis.com mark.schreiber at novartis.com
Thu Aug 10 07:56:42 UTC 2006


Hi -

There is a difference between the gap returned by 
AlphabetManager.getGapSymbol and the gap returned by an 
alphabet.getGapSymbol(). There is some very complex reasons for this which 
could make up a large part of a thesis (literally, take a look at Matthew 
Pococks thesis some time). Simply speaking, dynamic programming and HMMs 
wouldn't work without it.

It becomes especially obvious when you have an alignment. The alphabet of 
an alignment of 3 DNA sequences is DNAxDNAxDNA. Thus a gap from that 
alphabet is really gap x gap x gap.

Depending on what you are trying to do you would want to test for 

Symbol s == align.getAlphabet().getGap()

or 

Symbol s == DNATools.getDNA().getGap().

- Mark





"Nathan S. Haigh" <n.haigh at sheffield.ac.uk>
Sent by: biojava-l-bounces at lists.open-bio.org
08/10/2006 04:31 PM

 
        To:     Richard Holland <holland at ebi.ac.uk>, biojava-l at lists.open-bio.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        Re: [Biojava-l] Alignment objects


Richard Holland wrote:
> You could change this:
>
> sym.getName().contains("[]")
>
> to this:
>
> AlphabetManager.getGapSymbol().equals(sym)
>
> Frequency calculations can be done quite quickly using 
DistributionTools:
>
>     Distribution[] dists = DistributionTools.distOverAlignment(algn,
> true);
> // true says to include gaps in the statistics
>     // The dists array will have the same number of entries as there
>     // are columns in the alignment.
>     for (int i = 0; i < dists.length; i++) {
>         // i = 0 = first column in alignment
>         Distribution dist = dists[i];
>         // Find out the weight for A in this column.
>         double AWeight = dist.getWeight(DNATools.a());
>         // Find out the weight for gaps in this column.
>         double GapWeight =
> dist.getWeight(DNATools.getDNA().getGapSymbol());
>     }
>
> cheers,
> Richard
This is definitely getting close to what i need. However, i think i'm
having trouble with alphabets which is stopping me from using soemthing
like:
AlphabetManager.getGapSymbol().equals(sym)

I currently creating an alignment like this:
    String alnString1 =
            ">seq1\n" +
            "----FGHIKLMNPQRST\n" +
            ">seq2\n" +
            "ACDEFGHIKLMNPQRST\n";
        BufferedReader br1 = new BufferedReader(new
StringReader(alnString1));
        FastaAlignmentFormat faf1 = new FastaAlignmentFormat();
        aln1 = faf1.read( br1 );

And i never get true returned from:
AlphabetManager.getGapSymbol().equals(sym)

I assume this is because the mechanisms that are in place for setting
the alphabet of the alignment are not correctly setting the gap symbol.
The program i am writing should be capable of determining the alphabet
of any alignment that is loaded, so it makes sense to change:
AlphabetManager.getGapSymbol().equals(sym)
to:
alignment.getAlphabet.getGapSymbol().equals(sym)

but this doesn't work either. Eventually i'd like my application to be
able to load alignment from several different formats, some of which may
use more than one symbol as the gap, while others have a "default" gap
character. Are there mechanisms in place to attempt to correctly set the
gapSymbol for an alignment? For example FASTA format alignments should
probably set the gap symbol to the hyphen "-".

Once again, being new to this, i am probably missing something that is
obvious to you guys.
Thanks for all your time end effort in helping me out.
Nathan

_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l






More information about the Biojava-l mailing list