[EMBOSS] [BiO BB] Building an alignment from BLAST hsp

Carlos Quijano cquijano at iib.uam.es
Tue Mar 28 09:49:01 UTC 2006


Hi all,

I didnt read it before, sorry for the "lapsus". And sorry for the
information if what I tell you is not exactly what you needed, Ryan.

What you are looking for is just _MVIEW_, an old but nice application.
Use scholar.google.com / pubmed to find more information about it, I
remember that there are web servers running cgi's somewhere. It is
possible than during this last years, somebody has published a new
better tool or a new mview version.... Look for it.

MVIEW is a parser for your blast output.
MVIEW works for your problem because you wanna align only one sequence
(as a template) to a entire database (I suppose that with any cutoff in
the e-value or p-vale, at least the default, it is, ten) or against a
set of some sequences or only one more sequence (2 sequences alignment).

I continue with some considerations about aligning HSPs from Blast the
way you pretend and mview does... there are important considerations and
it is only a minute to read:
Remember, what you get is what you wanted, but not a real thing (this is
something very typical in bioinformatics - and all science - hahaha).
You dont get a real multiple alignment, you get an artifact that is a
entire database's gene-blast.hsps constructs piled down a template gene
(your sequence). 
All right then. You dont have by any means an alignment, nor even an
alignment of the genes using HSPs, because, there can be some hsps
alignable between sequences in the database that are hidden for the
alignment when sequences are piled down your sequence, because your
sequence lacks this hsps and are _ignored_. 
Why is this so important?
What I actually mean is that if you use this "sequences piled down a
template" as a multiple alignment, you will be lying about the topology
underlying (it is, not lying ;-) in the gene network, that arises from
your database plus your sequence when correctly aligned, it is, all
against all... etc,etc, etc.
Well, it is the mathematical exhaustive-optimal way... normally we use
heuristics again, and again, and again... But "all against all" is the
key concept involved in the multiple alignment problem. It is very
important to be aware of this things.
needle is the optimal way <-> Blast is the heuristic
Clustal is also a very very heuristic solution to the massive problem of
multiple alignment. And personally I prefer to use muscle that uses a
better mathematical model and is (right now) the quickest aligner for
the most of the cases.

I am sure that most of you know it. 
I hope it is usefull for newbies and others, so forgive me for the
boring tedious discourse...


CQ

El mar, 28-03-2006 a las 09:25 +0200, Catherine Letondal escribió:

> On Mar 27, 2006, at 8:03 PM, Ryan Golhar wrote:
> 
> > Hi Peter,
> >
> >> You are quite right that EMBOSS may align the sequences completely
> >> differently - unless the HSPs are very significant and cover most
> >> of the sequence this will be true of any attempt to simply realign.
> >> There has to be some way to pass on the HSPs as fixed positions,
> >> as in the BioPerl solution.
> >
> > I looked at a bioperl method, but can't seem to find something that 
> > will
> > accomplish this.
> >
> >> However, it could make a nice EMBOSS application - the only question
> >> would be how you would like to specify the HSPs. Perhaps we could read
> >
> >> BLAST output (in some specified format), or perhaps some other way to
> >> give the input alignments.
> >
> > Yes, I agree.  I suppose the best way would be to specify the two
> > sequences and the blast output.  The application could then construct 
> > an
> > alignment based on a particular HSP (probably the first one, or 
> > whatever
> > the user specifies).
> >
> 
> Have you tried this:
> http://bioweb.pasteur.fr/seqanal/interfaces/seqsblast.html
> 
> It is based on bioperl. check "Get HSP" option (you can even extend it).
> 
> Best,
> 
> --
> Catherine Letondal -- Institut Pasteur -- Computing Center
> 
> _______________________________________________
> EMBOSS mailing list
> EMBOSS at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss

Carlos Quijano
http://www2.iib.uam.es/cquijano
Evolution and Development laboratory
Regulation of Gene Expression Department
Institute for Biomedical Research
http://www.iib.uam.es



More information about the EMBOSS mailing list