[Biojava-l] how to calculate consensus from a fasta file

Eric BELLARD eric_bellard at yahoo.com
Wed Jan 14 04:03:05 EST 2004


Thanks for your response.

My problem is easier than you though.

I simpy have to calculate the ambiguity symbol for
each column.

My solution is:
- create a list whith a set of symbol for each column
- fill the set with each symbol of each sequence
- calculate the ambiguity symbols for each set of this
list

It works pretty well but if the sequences become too
long I imagine I'll use too much memory.

I'll try to find another solution using the alignment
object in the framework. At the moment I don't know
enough the framework to find solution of this kind
with it. I'll try...

Anyway thanks for your help.

Eric

--- mark.schreiber at group.novartis.com wrote:
> Hi Eric -
> 
> I'm not sure if this will solve your problem but you
> could make an 
> Alignment object from the sequences and then use the
> methods of 
> DistributionTools to get a Distribution object for
> each position in the 
> Alignment. These distributions will tell you the
> frequency of each base at 
> each position in the Alignment which you could use
> to make a consensus. 
> You can also use DistributionTools to calculate
> information or entropy at 
> each position.
> 
> Alternatively you could generate a markov model that
> represents the 
> alignment and probabilistically represents the
> consensus.
> 
> Hope this helps
> 
> Mark
> 
> 
> 
> Mark Schreiber
> Principal Scientist (Bioinformatics)
> 
> Novartis Institute for Tropical Diseases (NITD)
> 1 Science Park Road
> #04-14 The Capricorn
> Singapore 117528
> 
> phone +65 6722 2973
> fax  +65 6722 2910
> 
> 
> 
> 
> 
> Eric BELLARD <eric_bellard at yahoo.com>
> Sent by: biojava-l-bounces at portal.open-bio.org
> 01/13/2004 09:35 PM
> Please respond to eric
> 
>  
>         To:     biojava-l at biojava.org
>         cc: 
>         Subject:        [Biojava-l] how to calculate
> consensus from a fasta file
> 
> 
> Hi,
> 
> I'd like to first thank you all for your great job
> on
> this project.
> 
> I'm using biojava in a project to store some
> sequencing result.
> 
> In my application the user upload sequences from a
> fasta file, and I like to build an alignment from
> it.
> 
> With your project, I can easily parse the fasta file
> and get all the sequences. 
> 
> Let's consider the sequences as lines.
> I'd like to calculate the column consensus using dna
> degenerate alphabet.
> 
> Does biojava implements a way to do this?
> 
> Thanks by advance.
> 
> Eric
> 
> 
> 
> __________________________________
> Do you Yahoo!?
> Yahoo! Hotjobs: Enter the "Signing Bonus"
> Sweepstakes
> http://hotjobs.sweepstakes.yahoo.com/signingbonus
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
> 
> 
> 


__________________________________
Do you Yahoo!?
Yahoo! Hotjobs: Enter the "Signing Bonus" Sweepstakes
http://hotjobs.sweepstakes.yahoo.com/signingbonus


More information about the Biojava-l mailing list