[Bioperl-l] Alignment from blast report

Mark A. Jensen maj at fortinbras.us
Thu Mar 4 14:50:54 UTC 2010


Paolo -- 
Ok, there's now (r16900) an *experimental* method in
Bio::Search::Tiling::MapTiling called get_tiled_alns().
POD is below. Try it out and let me know--
cheers,
MAJ


=head1 TILED ALIGNMENTS

The experimental method L<ALIGNMENTS/get_tiled_alns> will use a tiling
to concatenate tiled hsps into a series of L<Bio::SimpleAlign>
objects:

 @alns = $tiling->get_tiled_alns($type, $context);

Each alignment contains two sequences with ids 'query' and 'subject',
and consists of a concatenation of tiling HSPs which overlap or are
directly adjacent. The alignment are returned in C<$type> sequence
order. When HSPs overlap, the alignment sequence is taken from the HSP
which comes first in the coverage map array.

The sequences in each alignment contain features (even though they are
L<Bio::LocatableSeq> objects) which map the original query/subject
coordinates to the new alignment sequence coordinates. You can
determine the original BLAST fragments this way:

 $aln = ($tiling->get_tiled_alns)[0];
 $qseq = $aln->get_seq_by_id('query');
 $hseq = $aln->get_seq_by_id('subject');
 foreach my $feat ($qseq->get_SeqFeatures) {
    $org_start = ($feat->get_tag_values('query_start'))[0];
    $org_end = ($feat->get_tag_values('query_end'))[0];
    # original fragment as represented in the tiled alignment:
    $org_fragment = $feat->seq;
 }
 foreach my $feat ($hseq->get_SeqFeatures) {
    $org_start = ($feat->get_tag_values('subject_start'))[0];
    $org_end = ($feat->get_tag_values('subject_end'))[0];
    # original fragment as represented in the tiled alignment:
    $org_fragment = $feat->seq;
 }


----- Original Message ----- 
From: "Paolo Pavan" <paolo.pavan at gmail.com>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "Chris Fields" <cjfields at illinois.edu>; <bioperl-l at lists.open-bio.org>
Sent: Tuesday, March 02, 2010 10:17 AM
Subject: Re: [Bioperl-l] Alignment from blast report


>I think you got the sense, thank you. Of course hsps from different
> hits will be reflected in different elements aligned. I've attached
> the example pasted (unix text) because is more readable, hoping will
> not be held by the mailing server :-)
>
> Thank you,
> Paolo
>
> 2010/3/2 Mark A. Jensen <maj at fortinbras.us>:
>> This might a good method to have for Bio::Search::Tiling--
>> you want to stitch together all the hsps and have the
>> concatenated alignment returned as a Bio::SimpleAlign,
>> correct? Tiling would create the right set of hsps from
>> which to generate the composite alignment. I can
>> try to get something working, but it may take a while-
>> MAJ
>> ----- Original Message ----- From: "Paolo Pavan" <paolo.pavan at gmail.com>
>> To: "Chris Fields" <cjfields at illinois.edu>
>> Cc: <bioperl-l at lists.open-bio.org>
>> Sent: Tuesday, March 02, 2010 9:37 AM
>> Subject: Re: [Bioperl-l] Alignment from blast report
>>
>>
>> Hi Chris,
>> Thank you for your reply. So I have to understand that since the
>> get_aln method returns the HSP alignment, there is no way to retrieve
>> the whole alignment as in the example pasted, isn't it?
>> Basically I'm trying to use megablast as kind of multiple local
>> alignment engine and actually I'm not pretty sure this is a good idea
>> but in my particular case could be suitable. I mean that the example
>> below reports only the portions of the sequences that align loosing
>> the portions that does not, I'm not sure I gave the idea. What do you
>> think about? Can you give me your opinion?
>> If there isn't any module written yet, I can try to write a parser, it
>> could be of any interest?
>>
>> Thank you,
>> Paolo
>>
>> 2010/3/2 Chris Fields <cjfields at illinois.edu>:
>>>
>>> Paolo,
>>>
>>> You can get a Bio::SimpleAlign from the HSP object. The first code example
>>> in this section in the HOWTO demonstrates this:
>>>
>>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
>>>
>>> chris
>>>
>>> On Mar 1, 2010, at 5:07 PM, Paolo Pavan wrote:
>>>
>>>> Dear all,
>>>> Sorry for pushing up my post but, please does anyone have an hint for me?
>>>> Maybe have I to send attached the report to the mailing list? I don't
>>>> know attachment policies of the list, if it is allowed and is needed I
>>>> can do that.
>>>>
>>>> Thank you,
>>>> Paolo
>>>>
>>>> 2010/2/26 Paolo Pavan <paolo.pavan at gmail.com>:
>>>>>
>>>>> Sorry,
>>>>> Maybe I forgot to add this is the megablast -m 5 output.
>>>>>
>>>>> Thank you again,
>>>>> Paolo
>>>>>
>>>>> 2010/2/26 Paolo Pavan <paolo.pavan at gmail.com>:
>>>>>>
>>>>>> Hi all,
>>>>>> I have just a brief question: I've got some megablast reports such the
>>>>>> one I've pasted below.
>>>>>> I'm aware of the existence of the Bio::Search::IO::megablast and the
>>>>>> Bio::Search::HSP::BlastHSP::get_aln but, is there a way to get the
>>>>>> entire alignment represented as a Bio::SimpleAlign object or
>>>>>> Bio::Align::AlignI implementing one?
>>>>>>
>>>>>> Thank you all,
>>>>>> Paolo
>>>>>>
>>>>>>
>>>>>> MEGABLAST 2.2.16 [Mar-25-2007]
>>>>>>
>>>>>>
>>>>>> Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb Miller
>>>>>> (2000),
>>>>>> "A greedy algorithm for aligning DNA sequences",
>>>>>> J Comput Biol 2000; 7(1-2):203-14.
>>>>>>
>>>>>> Database: 00038-00053.fasta
>>>>>> 2 sequences; 2001 total letters
>>>>>>
>>>>>> Searching..................................................done
>>>>>>
>>>>>> Query= 00038-00053
>>>>>> (802 letters)
>>>>>>
>>>>>>
>>>>>>
>>>>>> Score E
>>>>>> Sequences producing significant alignments: (bits) Value
>>>>>>
>>>>>> ______00038
>>>>>> 226 1e-62
>>>>>> ______00053
>>>>>> 115 3e-29
>>>>>>
>>>>>> 1_0 472
>>>>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 531
>>>>>> ______00038 883
>>>>>> ccgacaataattcttgttggaatcttcggcagttttttgtacaggagccagtagttcaaa 942
>>>>>> ______00053
>>>>>> ------------------------------------------------------------
>>>>>>
>>>>>> 1_0 532
>>>>>> aagaaagcgatcaataaaa-taaaaatcacaaaaaaattaccaaaaacatatttataaat 590
>>>>>> ______00038 943
>>>>>> aagaaagcgatcaataaaaataaaaatcacaaaaaaattaccaaaaacatatttataaa- 1001
>>>>>> ______00053
>>>>>> ------------------------------------------------------------
>>>>>>
>>>>>> 1_0 591
>>>>>> attggcaaaaaaattgccaacaattcccaaacggaaaattcccaaaacaaagagagcgtc 650
>>>>>> ______00038 1000
>>>>>> ------------------------------------------------------------ 1001
>>>>>> ______00053
>>>>>> ------------------------------------------------------------
>>>>>>
>>>>>> 1_0 651
>>>>>> gataaccaatatcaaaatagtttttgaatttattttttgtgtttttttagtttttcttct 710
>>>>>> ______00038 1000
>>>>>> ------------------------------------------------------------ 1001
>>>>>> ______00053
>>>>>> ------------------------------------------------------------
>>>>>>
>>>>>> 1_0 711
>>>>>> acgtcgtgttgccatttatccagcattaagtctataaaaaaaaacggtcagataaaaatg 770
>>>>>> ______00038 1000
>>>>>> ------------------------------------------------------------ 1001
>>>>>> ______00053 1
>>>>>> -------------------------ttaagtctataaaaaaaa-cggtcagataaaaatg 34
>>>>>>
>>>>>> 1_0 771 ccttaagtatttactttaacttgtcttgatca 802
>>>>>> ______00038 1000 -------------------------------- 1001
>>>>>> ______00053 35 ccttaagtatt-actttaacttgtcttgatca 65
>>>>>> Database: 00038-00053.fasta
>>>>>> Posted date: Feb 25, 2010 4:47 PM
>>>>>> Number of letters in database: 2001
>>>>>> Number of sequences in database: 2
>>>>>>
>>>>>> Lambda K H
>>>>>> 1.37 0.711 1.31
>>>>>>
>>>>>> Gapped
>>>>>> Lambda K H
>>>>>> 1.37 0.711 1.31
>>>>>>
>>>>>>
>>>>>> Matrix: blastn matrix:1 -3
>>>>>> Gap Penalties: Existence: 0, Extension: 0
>>>>>> Number of Sequences: 2
>>>>>> Number of Hits to DB: 17
>>>>>> Number of extensions: 3
>>>>>> Number of successful extensions: 3
>>>>>> Number of sequences better than 10.0: 2
>>>>>> Number of HSP's gapped: 2
>>>>>> Number of HSP's successfully gapped: 2
>>>>>> Length of query: 802
>>>>>> Length of database: 2001
>>>>>> Length adjustment: 10
>>>>>> Effective length of query: 792
>>>>>> Effective length of database: 1981
>>>>>> Effective search space: 1568952
>>>>>> Effective search space used: 1568952
>>>>>> X1: 9 (17.8 bits)
>>>>>> X2: 20 (39.6 bits)
>>>>>> X3: 51 (101.1 bits)
>>>>>> S1: 9 (18.3 bits)
>>>>>> S2: 9 (18.3 bits)
>>>>>>
>>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>


--------------------------------------------------------------------------------


> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l 




More information about the Bioperl-l mailing list