[DAS2] Example alignments
Lincoln Stein
lstein at cshl.edu
Mon Jun 5 14:31:50 UTC 2006
Hi Andrew,
I'm truly sorry at how long it has taken me to get these examples to you. I
hope that the example alignments in the enclosure makes sense to you.
Unfortunately I found that I had to add a new "target" attribute to <LOC> in
order to make the cigar string semantics unambiguous. Otherwise you wouldn't
be able to tell how to interpret the gaps.
Lincoln
--
Lincoln Stein
lstein at cshl.edu
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
-------------- next part --------------
CASE #1. A SIMPLE PAIRWISE ALIGNMENT.
A simple alignment is one in which the alignment is represented as a
single feature with no subfeatures. This is the preferred
representation to be used when the entire alignment shares the same
set of properties.
This is an alignment between Chr3 (the reference) and EST23 (the
target). Both aligned sequences are in the forward (+) direction. We
represent this as a single alignment
Chr4 100 CAAGACCTAAA-CTGGAATTCCAATCGCAACTCCTGGACC-TATCTATA 147
|||||||X||| ||||| ||||||| ||||X||| ||||||||
EST23 1 CAAGACCAAAATCTGGA-TTCCAAT-------CCTGCACCCTATCTATA 41
This has a CIGAR gap string of M11 I1 M5 D1 M7 D7 M8 I1 M8:
M11 match 11 bp
I1 insert 1 gap into the reference sequence
M5 match 5 bp
D1 insert 1 gap into the target sequence
M7 match 7 bp
D7 insert 7 gaps into the target
M8 match 8 bp
I1 insert 1 gap into the reference
M8 match 8 bp
Content-Type: application/x-das-features+xml
<?xml version="1.0" encoding="UTF-8"?>
<FEATURES
xmlns="http://biodas.org/documents/das2"
xml:base="http://www.biodas.org/das2/sequence/fly/Jun2006/">
<FEATURE uri="./Alignment1" type="./expressed_sequence_match" >
<LOC
segment="http://www.flybase.org/genome/D_melanogaster/R4.3/dna/4"
range="100:147:1"
</LOC>
<LOC
segment="http://www.flybase.org/genome/D_melanogaster/R4.3/dna/EST23"
target="1"
range="1:41:1"
gap="M11 I1 M5 D1 M7 D7 M8 I1 M8"
</LOC>
<PROP key="est2genomescore" value="180" />
</FEATURE>
</FEATURES>
NOTE: I've had to introduce a new <LOC> attribute named "target" in
order to distinguish the reference sequence from the target
sequence. This is necessary for the CIGAR string concepts to work.
Perhaps it would be better to have a "role" attribute whose values are
one of "ref" and "target?"
<!----------------------------------------------------------------------->
CASE #2. A COMPLEX PAIRWISE ALIGNMENT.
The complex pairwise alignment is used when the alignment is the
composite of two different alignments, each of which has its own set
of properties. An example of this is BLAST, in which each "BLAST hit"
is composed of multiple aligned segments called "HSPs".
We extend the previous example by adding another aligned segment to
the alignment.
BLAST hit: align Chr4:100:300 with EST23:1:58
HSP 1:
Chr4 100 CAAGACCTAAA-CTGGAATTCCAATCGCAACTCCTGGACC-TATCTATA 147
|||||||X||| ||||| ||||||| ||||X||| ||||||||
EST23 1 CAAGACCAAAATCTGGA-TTCCAAT-------CCTGCACCCTATCTATA 41
BLAST score = 80
CIGAR gap string M11 I1 M5 D1 M7 D7 M8 I1 M8:
HSP 2:
Chr4 211 TCAAACTGATAATGGGGT 228
||||||||||| ||||||
EST23 42 TCAAACTGATA-TGGGGT 58
BLAST score = 85
CIGAR gap string M11 D1 M6
We represent this as an "expressed_sequence_match" feature relating
Chr4 100:300 to EST23 1:58. The feature contains two subparts, one
corresponding to the HSP1 and the other corresponding to HSP2.
<?xml version="1.0" encoding="UTF-8"?>
<FEATURES
xmlns="http://biodas.org/documents/das2"
xml:base="http://www.biodas.org/das2/sequence/fly/Jun2006/">
<!-- A feature for the entire BLAST hit -->
<FEATURE uri="./Alignment2" type="./expressed_sequence_match" >
<LOC
segment="http://www.flybase.org/genome/D_melanogaster/R4.3/dna/4"
range="100:300:1"
</LOC>
<LOC
segment="http://www.flybase.org/genome/D_melanogaster/R4.3/dna/EST23"
target="1"
range="1:58:1"
</LOC>
<PART uri="./Alignment2.1" />
<PART uri="./Alignment2.2" />
</FEATURE>
<!-- HSP 1 -->
<FEATURE uri="./Alignment2.1" type="./match_part">
<LOC
segment="http://www.flybase.org/genome/D_melanogaster/R4.3/dna/4"
range="100:147:1"
</LOC>
<LOC
segment="http://www.flybase.org/genome/D_melanogaster/R4.3/dna/EST23"
target="1"
range="1:41:1"
gap="M11 I1 M5 D1 M7 D7 M8 I1 M8"
</LOC>
<PARENT uri="./Alignment2" />
<PROP key="blastscore" value="80" />
</FEATURE>
<!-- HSP 2 -->
<FEATURE uri="./Alignment2.2" type="./match_part">
<LOC
segment="http://www.flybase.org/genome/D_melanogaster/R4.3/dna/4"
range="211:228:1"
</LOC>
<LOC
segment="http://www.flybase.org/genome/D_melanogaster/R4.3/dna/EST23"
target="1"
range="42:58:1"
gap="M11 D1 M6"
</LOC>
<PARENT uri="./Alignment2" />
<PROP key="blastscore" value="85" />
</FEATURE>
</FEATURES>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/das2/attachments/20060605/2b6fd923/attachment.sig>
More information about the DAS2
mailing list