[EMBOSS] Is vectorstrip gapless by design or is it a bug ?

Charles Plessy charles-listes-emboss at plessy.org
Wed Feb 21 08:04:38 UTC 2007


Dear list,

I am using vectorstrip to find PCR primers in cloned PCR products. Strangely,
in some cases it misses a primer, because it overestimates the number of
mismatches.

In the following example, vectorstrip identifies the first primer with six
mismatches, although it has only two. It means that if I run vectorstrip with
a -mismatch value lower that 29, I do miss the primer.

The following is a mixture of shell commands and extracts of outputs. The
sequence consists of two reads assembled by using trimseq on .ab1 files, and
then merger on the resulting fasta files.


export SEQ="ttttcccccccccnntttttttnnnnncccccnnnnnnnnnaaaaAAccCcTcNCTaTagggCGAGTTggGccCtTCTAGTNtGCATGCtTCGAGcGGcccGccAGTgTTGATGGaTaTCTTGCaGaaTTcGcccTTaaTGAggTAACCgGTTcccAGCaGNttttttttttttttttttttttttttttttttttttttttttttttttttttttttttAaaaaGaaTTGtttattTACTGAACCNgggCAtAtTaGaTACACAACCCATTTTaaaTTTAcATcttttAAtTCaaTtTTGAAgTGttTTTAcAcAcCCNCNCAAaAaaaaaaaaaTTTGGCATGcAACAgCTgGGAACCGTtACCtCATTAAgggCGAAtTCcAGcAcAcTGGCgGCCGTTACtAAGGGATCCGAGCTcGGNACCAAGnnnngnnnnnnnnnnnnnnnnnnttntttnntnnnnaaaaa"

export LINKERA="AATGAGGTAACGGTTCCCAGC"

export LINKERB="GCTGGGAACCGTTACCTCATT"

vectorstrip 	asis:$SEQ \
		-linkera=$LINKERA \
		-linkerb=$LINKERB \
		-outfile stdout \
		-outseq /dev/null \
		-novectorfile \
		-nobesthits \
		-mismatch 30


Sequence: asis   Vector: no_name
5' sequence matches:
        From 138 to 158 with 6 mismatches
3' sequence matches:
        From 351 to 371 with 0 mismatches
Sequences output to file:
        from 159 to 350
                CaGNtttttttttttttttttttttttttttttttttttttttttttttt
                ttttttttttttAaaaaGaaTTGtttattTACTGAACCNgggCAtAtTaG
                aTACACAACCCATTTTaaaTTTAcATcttttAAtTCaaTtTTGAAgTGtt
                TTTAcAcAcCCNCNCAAaAaaaaaaaaaTTTGGCATGcAACA
        sequence trimmed from 5' end:
                ttttcccccccccnntttttttnnnnncccccnnnnnnnnnaaaaAAccC
                cTcNCTaTagggCGAGTTggGccCtTCTAGTNtGCATGCtTCGAGcGGcc
                cGccAGTgTTGATGGaTaTCTTGCaGaaTTcGcccTTaaTGAggTAACCg
                GTTcccAG
        sequence trimmed from 3' end:
                gCTgGGAACCGTtACCtCATTAAgggCGAAtTCcAGcAcAcTGGCgGCCG
                TTACtAAGGGATCCGAGCTcGGNACCAAGnnnngnnnnnnnnnnnnnnnn
                nnttntttnntnnnnaaaaa

needle asis:$SEQ[138:158] asis:$LINKERA stdout -auto

asis             138 aaTGAggTAACCgGTTcccAG-    158
                     |||||||||| |||||||||| 
asis               1 AATGAGGTAA-CGGTTCCCAGC     21


Interestingly, in the following aligmnent, the number of mismatches is
6. But I did not find anything saying that gaps were disallowed in
vectorscript ?

aaTGAggTAACCgGTTcccAG
||||||||||| | | ||  
AATGAGGTAACGGTTCCCAGC


I am using emboss through fink (emboss package 4.0.0-2).

Have a nice day,

-- 
Charles Plessy
http://charles.plessy.org
Wako, Saitama, Japan



More information about the EMBOSS mailing list