[Bioperl-l] Dumping a MSA from BLAST results
Chris Fields
cjfields at uiuc.edu
Tue Feb 19 17:18:34 UTC 2008
One could use an alternative blastall output format (like -m1 to -m6),
which give various anchored alignments. None of these are parsed via
bioperl as far as I know; might be worth getting something up and
running if there is enough interest in it.
chris
PS. Here's example output using 'blastall -p blastp -i test2.faa -d
CP000560.faa -m6', which is query-anchored, flat, blunt ends:
BLASTP 2.2.16 [Mar-25-2007]
Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.
Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs", Nucleic Acids Res. 25:3389-3402.
Query= gi|1373160|gb|AAB57770.1| PyrR
(173 letters)
Database: CP000560.faa
3693 sequences; 1,147,568 total letters
Searching..................................................done
Score E
Sequences producing significant alignments:
(bits) Value
gb|ABS73893.1| PyrR [Bacillus amyloliquefaciens FZB42]
322 1e-89
gb|ABS75590.1| ComFC [Bacillus amyloliquefaciens FZB42]
37 6e-04
gb|ABS72500.1| Prs [Bacillus amyloliquefaciens FZB42]
28 0.22
gb|ABS72703.1| YcdA [Bacillus amyloliquefaciens FZB42]
27 0.49
gb|ABS74832.1| Apt [Bacillus amyloliquefaciens FZB42]
26 1.1
gb|ABS75734.1| Upp [Bacillus amyloliquefaciens FZB42]
26 1.4
gb|ABS74081.1| NrdE [Bacillus amyloliquefaciens FZB42]
25 1.9
gb|ABS76054.1| RocD [Bacillus amyloliquefaciens FZB42]
24 4.1
gb|ABS74744.1| Gpr [Bacillus amyloliquefaciens FZB42]
24 5.4
gb|ABS74336.1| UvrX [Bacillus amyloliquefaciens FZB42]
23 7.1
gb|ABS74825.1| YrvM [Bacillus amyloliquefaciens FZB42]
23 9.2
gb|ABS72555.1| RpoB [Bacillus amyloliquefaciens FZB42]
23 9.2
1_0 1 MNQKAVILDEQAIRRALTRIAHEMIERNKGMNNCILVGIKTRGIYLAKR---
LAER---- 53
ABS73893 1 MNQKAVILDEQAIRRALTRIAHEMIERNKGMNDCILVGIKTRGIYLAKR---
LAER---- 53
ABS75590 116 -------------------------------
NTHTLIPIPLSGERLAERGFNQSEL---- 140
ABS72500 164 ----------------------------KDLKDIVIVSPDHGGVTRARK---
LADR---- 188
ABS72703 55 ----------------------------------------------ALK---
VTVT---- 61
ABS74832
------------------------------------------------------------
ABS75734
------------------------------------------------------------
ABS74081
------------------------------------------------------------
ABS74081 502 -----------------------------------------RSAELAKE---
KGET---- 513
ABS76054 305 ------VLEEEGLAERSLQLGRYFKEELEKIDNPIIKDVRGRGLFIGVE---
LTEAARPY 355
ABS74744 43 -------------------------ERDKG-------GIKVRTVDITKE---
GAEL---- 63
ABS74336
------------------------------------------------------------
ABS74825
------------------------------------------------------------
ABS72555
------------------------------------------------------------
1_0 54 IEQIEGNPVTVGEIDITLYRDDLSKKTSNDEPLVKGADIP-V--DITD---
QKVILVDDV 107
ABS73893 54 IEQIEGNPVTVGEIDITLYRDDLTKKTSNEEPLVKGADIP-A--DITD---
QKVIVVDDV 107
ABS75590 141 LASLLGMPVISPLIRLNNEKQSKKSKTDRLSAEKKFSAAE-N--SATG---
MNVILIDDI 194
ABS72500 189 LKA----PIAI---------IDKRRPRPNE---VEVMNIV-G--NVEG---
KTAILIDDI 226
ABS72703 62 VKNTGKDPLTVKSSDFSLYQDD--AKTAK-----------------TD---
KEDLMQSGT 99
ABS74832 112 ---------------------------------------------------
QRVLITDDL 120
ABS75734 100 ---------------VGLYRDPETLK-----PVEYYVKLP-S--DVEE---
REFIVVDPM 133
ABS74081 386 -----------------LQASQVSAYTDYDEEDEIGLDIS-C--NLGS---
LNILNVMKH 422
ABS74081 514 FEHYEGSTYATGEYFNKYIEKEFSPAYEKIAALFEGMHIP-TIEDWKE---
LKAFVAENG 569
ABS76054 356
CEKLKGEGLLCKETHDTVIR---------------------------------------- 375
ABS74744 64
SGKKQGRYVTIEAQGVREHDSDMQEKVT-------------------------------- 91
ABS74336 109 ----------------------------------KTIDLP-T--
NITMDIYRYCLILFDK 131
ABS74825 199 -------------------
REDVRKEVGNDEAKIRKAQMP-------------------- 219
ABS72555 1090 -----GAAYTLQEI-LTVKSDDVVGRVKTYEAIVKGDNVPEP--GVPE---
SFKVLIKEL 1138
1_0 108
LYTGRTVRAGMDALVDVGRPSSIQLAVLVDRGHRELPIRADYIGKNIPTSKSEKVMVQLD 167
ABS73893 108
LYTGRTVRAAMDALVDVGRPSSIQLAVLVDRGHRELPIRADYIGKNIPTSKAEKVMVQLS 167
ABS75590 195
YTTGATLHQAAEVLLTAGKASSVSSFTLI------------------------------- 223
ABS72500 227
IDTAGTITLAANALVENG------------------------------------------ 244
ABS72703 100
LHAGKTVTGNLYFTADEGK----------------------------------------- 118
ABS74832 121
LATGGTIEATIKLVEELG------------------------------------------ 138
ABS75734 134
LATGGSAVEAINSL---------------------------------------------- 147
ABS74081 423
KSIERTVKLATDSLTHVSETTDIRNAPAVRRANKAM------------------------ 458
ABS74081 570
MY---------------------------------------------------------- 571
ABS76054 374
------------------------------------------------------------ 375
ABS74744 90
------------------------------------------------------------ 91
ABS74336 132
FYTGKTVRS--------------------------------------------------- 140
ABS74825 218
------------------------------------------------------------ 219
ABS72555 1139 ------
QSLGMDVKILSGDEEEIEMRDLED------------------------------ 1162
1_0 168 EVDQND 173
ABS73893 168 EVDQTD 173
ABS75590 222 ------ 223
ABS72500 243 ------ 244
ABS72703 117 ------ 118
ABS74832 137 ------ 138
ABS75734 146 ------ 147
ABS74081 457 ------ 458
ABS74081 570 ------ 571
ABS76054 374 ------ 375
ABS74744 90 ------ 91
ABS74336 139 ------ 140
ABS74825 218 ------ 219
ABS72555 1161 ------ 1162
On Feb 19, 2008, at 10:17 AM, Jason Stajich wrote:
> All the individual pairwise alignments won't necessarily be an
> alignment of the same region and the gap insertions can be different
> in each instance of the query sequence that is participating in the
> pairwise alns so it won't fit into an MSA.
>
> It makes more sense to extract the aligned part of the hit sequences
> identified and a subsequence of the query which is the min and max
> region aligned. Run this through a MSA program.
>
> -jason
> On Feb 19, 2008, at 6:42 AM, Johan Nilsson wrote:
>
>> Hello,
>>
>> I have a question regarding the conversion from a Blast search
>> result (PSI-blast using blastpgp, to be more exact) to a multiple
>> sequence alignment file. I'm running the
>> Bio::Tools::Run::StandAloneBlast and I retrieve the HSPs from the
>> resulting Bio::Search::Hit::HitI objects. I have no problems
>> obtaining each HSP alignment using $hit->get_aln. However, rather
>> than dumping many local alignments, I would like to write a single
>> result file where the HSPs are interleaved.
>>
>> I guess this shouldn't be too hard, but nevertheless I haven't
>> found out how to do this in a simple way. Any suggestions would be
>> highly appreciated!
>>
>> Best Regards
>> /Johan Nilsson
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign
More information about the Bioperl-l
mailing list