[Biojava-l] Creating an alignment object

Richard Holland richard.holland at ebi.ac.uk
Fri May 12 08:34:41 UTC 2006


Sorry for the delay in replying - I had to leave work a bit early
yesterday.

> Nope, I don't need to generate an alignment, I already have an alignment in
> a file created by third party software (clustalw). 

There is nothing that I know of in BioJava that reads ClustalW files
directly into Alignment objects. (If someone else knows different,
please correct me). There are certainly methods in BioJava which read
the alignments from ClustalW into a set of String objects, each one
representing a member sequence (see SequenceAlignmentSAXParser), but I
don't know of anything more detailed than that.

The third-party package called Strap which I mentioned yesterday happily
reads/writes many of the major alignment formats, and has wrappers for
running ClustalW and other aligners programatically and reading back in
the results, so it is definitely worth a look. You can use a lot of its
functions without having to run the GUI, including reading/writing
various alignment formats.

> 
> In fact, the app I'd
> eventually like to have written in Java would include some sort of wrapper
> for clustalw in order to construct the alignments from a set of unaligned
> sequences, but algorithms implemented in Biojava would also be a welcome
> addition to the app.

If you want to wrap clustalw, the simplest way would be to create
Sequence objects in BioJava, write them out to Fasta using the BioJava
sequence IO tools, use the Java 'system' command (or one of the
alternatives to it) to run ClustalW. However you still then have the
problem of reading the output back in again.

The classes in org.biojava.bio.alignment that I mentioned yesterday
implements several useful alignment algorithms which you can use as an
alternative to ClustalW.

> But first things first.
> If I didn't have any sequences or an alignment in any files. What is the
> easiest way to get an alignment object in Java to have a play around with?

Make an instance of FlexibleAlignment from org.biojava.bio.alignment,
and use its methods to add sequences to it. It doesn't do any aligning
itself - it is just a placeholder to contain sequences and information
about how they align. You have to use its methods to add and remove
sequences from the alignment, to add/remove gaps and deletions, and get
things like consensus sequences etc.

Technically I suppose you could use FlexibleAlignment in conjunction
with SequenceAlignmentSAXParser to read alignment members as strings,
construct sequences based on them, and add them to the alignment object,
but I haven't tried this myself. It'd probably require some extra
processing to convert the dashes (gaps) in the inputted strings into
proper gaps in the alignment.

> Is there a way to just "magically" create a default alignment of say 5
> sequences with 20 positions?

You'd have to manually create yourself 5 sequences and add them to a
FlexibleAlignment as described above.

cheers,
Richard

-- 
Richard Holland (BioMart Team)
EMBL-EBI
Wellcome Trust Genome Campus
Hinxton
Cambridge CB10 1SD
UNITED KINGDOM
Tel: +44-(0)1223-494416




More information about the Biojava-l mailing list