[Bioperl-l] cigar string in GenericHSP

Jason Stajich jason at cgt.mc.duke.edu
Tue Mar 11 21:48:21 EST 2003


okay -

This is also possible via:

$hsp->get_aln->cigar_line();

But I am fine adding it to the HSP if it is faster.

-jason
On Wed, 12 Mar 2003, Juguang Xiao wrote:

> Hi all,
>
> I added one method in Bio::Search::HSP::GenericHSP, named cigar_string.
> The Cigar string issue raises when we try to annotate genome and store
> into ensembl 9 and above database. I attach the concept of cigar string
> at the end of this email.
>
> Now you can have a very simple script to get cigar string from hsp,
> which works for all favors of blast.
>
> my $factory = new Bio::SearchIO( -format => 'blast', -file =>
> 't/data/blast.report');
> my $hsp = $factory->next_result->next_hit->next_hsp; # supposed to be
> GenericHSP
> my $cigar_string = $hsp->cigar_string;
>
> Beside this, I also wrote a static method to generate_cigar_string from
> 2 equal-length seqence, and you can use it more directly if you have a
> alignment sequence.
>
> my $qstr = 'tacgcta--tacgcta--cactg-c';
> my $hstr = 'tac---tacgt----ctacgca---cc';
> my $cigar_string = Bio::Search::HSP::GenericHSP::generate_cigar_string
> ($qstr, $hstr);
>
> t/cigarstring.t is serving to test.
>
> Suggestions or questions? Thanks
>
> Juguang
>
> ----------
> Copied from ensembl doc.
>
> Sequence alignment hits were previously stored within the core database
> as
> ungapped alignments. This imposed 2 major constraints on alignments:
>
> a) alignments for a single hit record would require multiple rows in the
> database, and
> b) it was not possible to accurately retrieve the exact original
> alignment.
>
> Therefore, in the new branch sequence alignments are now stored as
> ungapped
> alignments in the cigar line format (where CIGAR stands for Concise
> Idiosyncratic Gapped Alignment Report).
>
> In the cigar line format alignments are sotred as follows:
>
> M: Match
> D: Deletino
> I: Insertion
>
> An example of an alignment for a hypthetical protein match is shown
> below:
>
>
> Query:   42 PGPAGLP----GSVGLQGPRGLRGPLP-GPLGPPL...
>              PG    P    G     GP   R      PLGP
> Sbjct: 1672 PGTP*TPLVPLGPWVPLGPSSPR--LPSGPLGPTD...
>
>
> protein_align_feature table as the following cigar line:
>
> 7M4D12M2I2MD7M
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list