remarks about sequence formats
Guy Bottu
gbottu at ben.vub.ac.be
Fri Feb 13 11:46:56 UTC 2004
Dear developers,
I just installed EMBOSS 2.8.0 at the BEN site. I did some testing and I
noticed the following things about file formats :
- the documentation says that "phylip" is non-interleaved format and
"phylip3" interleaved format, but the programs write exactly the reverse
("phylip3::myseqs" has the sequences the one after the other and
"phylip::myseqs" has them interleaved)
- PAUP format is always written with "datatype=DNA", also when the
sequences are proteins
- an old pain in the ass : Staden "experiment" format is still not
handled. I include an "experiment" file as attachment. As far as I
know the differences with EMBL format are :
- more field types
- AC not mandatory
- only 1 sequence per file
- data fields allowed behind the sequence
Note the AV lines with the "confidence values" of the bases. EMBOSS
library routines to read and write these could be useful. There is
software that can do something intelligent with "confidence values",
e.g. Primer3, but at the present eprimer3 cannot handle them.
Regards,
Guy Bottu
-------------- next part --------------
ID 000256_11cR
EN 000256_11cR
LN 000256_11cR..ztr
LT ZTR
AQ 50.180000
AV 18 20 18 18 19 13 13 6 6 6 6 6 9 7 8 11 11 21 24 32 39 46 40
AV 33 34 29 29 29 26 20 20 26 22 26 42 46 51 51 45 40 32 32
AV 32 33 34 32 24 17 13 10 10 19 19 28 28 28 29 35 35 40 39
AV 51 51 51 46 46 42 42 42 46 46 51 51 56 56 51 51 51 51 46
AV 40 35 35 35 39 39 39 56 51 51 51 51 39 39 39 39 39 39 45
AV 40 45 45 45 51 51 51 51 51 51 51 45 45 45 45 51 51 56 56
AV 51 51 51 51 51 51 56 56 56 56 56 56 56 56 56 56 56 51 51
AV 51 51 45 45 51 51 51 40 51 51 45 45 45 40 40 43 43 43 43
AV 45 51 45 56 51 51 51 51 51 45 45 45 45 45 45 51 51 51 51
AV 45 45 51 51 51 51 56 56 56 56 56 56 51 51 51 51 51 56 56
AV 56 51 51 51 51 51 51 56 56 56 56 56 56 56 56 46 43 43 43
AV 43 43 43 51 43 43 43 43 43 43 56 56 56 56 56 56 56 56 56
AV 56 56 51 45 45 45 45 45 51 51 51 51 51 56 56 51 56 56 51
AV 51 51 51 51 51 51 56 56 56 56 51 51 51 51 51 51 56 51 51
AV 51 51 51 51 56 56 56 56 56 51 51 51 51 51 51 51 51 56 56
AV 56 56 56 56 56 45 45 45 43 43 43 46 46 46 51 56 56 56 51
AV 51 51 56 56 45 45 45 45 45 45 56 56 56 56 56 56 56 56 56
AV 51 51 51 51 51 51 43 43 43 43 43 43 43 45 45 45 45 45 46
AV 43 43 43 43 43 43 51 56 43 43 43 43 43 43 51 51 46 46 46
AV 46 51 46 51 51 51 51 51 51 56 56 56 51 45 45 45 45 45 45
AV 51 51 51 56 51 51 51 51 51 51 51 56 56 56 56 56 56 51 51
AV 45 45 45 45 45 45 51 51 56 56 56 56 56 56 56 56 56 56 56
AV 56 51 51 51 45 45 45 45 45 45 43 43 43 43 43 43 43 43 43
AV 46 46 51 41 40 45 45 45 45 51 51 45 51 51 45 45 45 45 45
AV 45 45 45 45 45 45 45 45 45 51 56 56 56 56 56 56 56 56 56
AV 56 51 51 51 45 45 45 40 40 37 37 37 40 56 51 56 56 56 51
AV 51 51 51 51 51 56 45 45 45 45 40 40 45 45 40 40 45 45 40
AV 40 45 45 51 51 51 51 46 46 40 51 40 37 37 37 40 45 45 45
AV 45 56 45 40 35 35 35 32 32 35 46 42 51 51 51 46 46 37 35
AV 35 35 35 39 40 40 40 35 35 35 35 35 35 42 42 46 46 46 40
AV 40 40 40 40 38 42 42 27 19 11 11 11 28 27 40 40 40 44 32
AV 32 29 29 15 20 18 24 14 19 9 10 10 19 27 37 25 25 25 24
AV 29 29 29 20 12 12 13 12 15 12 11 8 8 8 8 7 7 7 4 0 0 0
AV 0 0 0 0 0
SQ
GTGGGCAGAA AAGTTGACAT TCCTCTTCTG CATTTCCTGG ATTGAAAACA GAGCAAATGA
CTGGCGCTTT GAAACCTTGA ATGTATTCTG CAAATACTGA GCATCAAGTT CACTTTCTTC
CATTTCTATG CTTGTTTCCC GACTGTGGTT AACTTCATGT CCCAATGGAT ACTTAAAGCC
TTCTGTGTCA TTTCTATTAT CTTTGGAACA ACCATGAATT AGTCCCTTGG GGTTTTCAAA
TGCTGCACAC TGACTCACAC ATTTATTTGG TTCTGTTTTT GCCTTCCCTA GAGTGCTAAC
TTCCAGTAAC GAGATACTTT CCTGAGTGCC ATAATCAGTA CCAGGTACCA GTGAAATACT
GCTACTCTCT ACAGATCTTT CAGTTTGCAA AACCCTTTCT CCACTTAACA TGAGATCTTT
GGGGTCTTCA GCATTATTAG ACACTTTAAC TGTTTCTAGT TTCTCTTCTT TTTCTTCTCT
TGGAAGGCTA GGATTGACAA ATTCTTTAAG TTCACTGGTA TTTGAACACT TAGTAAAAGA
ACCAGGTGCA TTTGTTAACT TCAGCTCTGG GAAAGTATCA CTGTCATGTC TTTTACTTGT
CTGTTCATTT GGCACTGGCC GTCGCGCTTC ANNNNNNNN
//
TN 000256_11c
PR 2
QL 17
QR 617
WT /home/gbottu/demo/exercise_snip/000906_11cR.scf
WL -1
WR -1
TG MUTA - 50..50
TG C->T Sensitivity= 7.86, Alignment=0.14, Width=0.86, Amplitude=643
TG MUTA - 351..351
TG T->C Sensitivity= 8.13, Alignment=0.07, Width=1.00, Amplitude=1105
TG MUTA - 580..580
TG C->T Sensitivity= 6.19, Alignment=0.29, Width=1.21, Amplitude=1338
TG HETE - 585..585
TG AG Ratio=0.87, Alignment=0.00, Amplitude1=0.18, Amplitude2=0.15
More information about the emboss-dev
mailing list