[EMBOSS] Is vectorstrip gapless by design or is it a bug ?
Charles Plessy
charles-listes-emboss at plessy.org
Wed Feb 21 08:04:38 UTC 2007
Dear list,
I am using vectorstrip to find PCR primers in cloned PCR products. Strangely,
in some cases it misses a primer, because it overestimates the number of
mismatches.
In the following example, vectorstrip identifies the first primer with six
mismatches, although it has only two. It means that if I run vectorstrip with
a -mismatch value lower that 29, I do miss the primer.
The following is a mixture of shell commands and extracts of outputs. The
sequence consists of two reads assembled by using trimseq on .ab1 files, and
then merger on the resulting fasta files.
export SEQ="ttttcccccccccnntttttttnnnnncccccnnnnnnnnnaaaaAAccCcTcNCTaTagggCGAGTTggGccCtTCTAGTNtGCATGCtTCGAGcGGcccGccAGTgTTGATGGaTaTCTTGCaGaaTTcGcccTTaaTGAggTAACCgGTTcccAGCaGNttttttttttttttttttttttttttttttttttttttttttttttttttttttttttAaaaaGaaTTGtttattTACTGAACCNgggCAtAtTaGaTACACAACCCATTTTaaaTTTAcATcttttAAtTCaaTtTTGAAgTGttTTTAcAcAcCCNCNCAAaAaaaaaaaaaTTTGGCATGcAACAgCTgGGAACCGTtACCtCATTAAgggCGAAtTCcAGcAcAcTGGCgGCCGTTACtAAGGGATCCGAGCTcGGNACCAAGnnnngnnnnnnnnnnnnnnnnnnttntttnntnnnnaaaaa"
export LINKERA="AATGAGGTAACGGTTCCCAGC"
export LINKERB="GCTGGGAACCGTTACCTCATT"
vectorstrip asis:$SEQ \
-linkera=$LINKERA \
-linkerb=$LINKERB \
-outfile stdout \
-outseq /dev/null \
-novectorfile \
-nobesthits \
-mismatch 30
Sequence: asis Vector: no_name
5' sequence matches:
From 138 to 158 with 6 mismatches
3' sequence matches:
From 351 to 371 with 0 mismatches
Sequences output to file:
from 159 to 350
CaGNtttttttttttttttttttttttttttttttttttttttttttttt
ttttttttttttAaaaaGaaTTGtttattTACTGAACCNgggCAtAtTaG
aTACACAACCCATTTTaaaTTTAcATcttttAAtTCaaTtTTGAAgTGtt
TTTAcAcAcCCNCNCAAaAaaaaaaaaaTTTGGCATGcAACA
sequence trimmed from 5' end:
ttttcccccccccnntttttttnnnnncccccnnnnnnnnnaaaaAAccC
cTcNCTaTagggCGAGTTggGccCtTCTAGTNtGCATGCtTCGAGcGGcc
cGccAGTgTTGATGGaTaTCTTGCaGaaTTcGcccTTaaTGAggTAACCg
GTTcccAG
sequence trimmed from 3' end:
gCTgGGAACCGTtACCtCATTAAgggCGAAtTCcAGcAcAcTGGCgGCCG
TTACtAAGGGATCCGAGCTcGGNACCAAGnnnngnnnnnnnnnnnnnnnn
nnttntttnntnnnnaaaaa
needle asis:$SEQ[138:158] asis:$LINKERA stdout -auto
asis 138 aaTGAggTAACCgGTTcccAG- 158
|||||||||| ||||||||||
asis 1 AATGAGGTAA-CGGTTCCCAGC 21
Interestingly, in the following aligmnent, the number of mismatches is
6. But I did not find anything saying that gaps were disallowed in
vectorscript ?
aaTGAggTAACCgGTTcccAG
||||||||||| | | ||
AATGAGGTAACGGTTCCCAGC
I am using emboss through fink (emboss package 4.0.0-2).
Have a nice day,
--
Charles Plessy
http://charles.plessy.org
Wako, Saitama, Japan
More information about the EMBOSS
mailing list