remarks about sequence formats

Guy Bottu gbottu at ben.vub.ac.be
Fri Feb 13 11:46:56 UTC 2004


	Dear developers,

I just installed EMBOSS 2.8.0 at the BEN site. I did some testing and I 
noticed the following things about file formats :

- the documentation says that "phylip" is non-interleaved format and
  "phylip3" interleaved format, but the programs write exactly the reverse
  ("phylip3::myseqs" has the sequences the one after the other and 
  "phylip::myseqs" has them interleaved)

- PAUP format is always written with "datatype=DNA", also when the 
  sequences are proteins

- an old pain in the ass : Staden "experiment" format is still not 
  handled. I include an "experiment" file as attachment. As far as I
  know the differences with EMBL format are :
    - more field types
    - AC not mandatory
    - only 1 sequence per file
    - data fields allowed behind the sequence
  Note the AV lines with the "confidence values" of the bases. EMBOSS
  library routines to read and write these could be useful. There is
  software that can do something intelligent with "confidence values",
  e.g. Primer3, but at the present eprimer3 cannot handle them.

	Regards,
	Guy Bottu
-------------- next part --------------
ID   000256_11cR
EN   000256_11cR
LN   000256_11cR..ztr
LT   ZTR
AQ   50.180000
AV   18 20 18 18 19 13 13 6 6 6 6 6 9 7 8 11 11 21 24 32 39 46 40 
AV        33 34 29 29 29 26 20 20 26 22 26 42 46 51 51 45 40 32 32 
AV        32 33 34 32 24 17 13 10 10 19 19 28 28 28 29 35 35 40 39 
AV        51 51 51 46 46 42 42 42 46 46 51 51 56 56 51 51 51 51 46 
AV        40 35 35 35 39 39 39 56 51 51 51 51 39 39 39 39 39 39 45 
AV        40 45 45 45 51 51 51 51 51 51 51 45 45 45 45 51 51 56 56 
AV        51 51 51 51 51 51 56 56 56 56 56 56 56 56 56 56 56 51 51 
AV        51 51 45 45 51 51 51 40 51 51 45 45 45 40 40 43 43 43 43 
AV        45 51 45 56 51 51 51 51 51 45 45 45 45 45 45 51 51 51 51 
AV        45 45 51 51 51 51 56 56 56 56 56 56 51 51 51 51 51 56 56 
AV        56 51 51 51 51 51 51 56 56 56 56 56 56 56 56 46 43 43 43 
AV        43 43 43 51 43 43 43 43 43 43 56 56 56 56 56 56 56 56 56 
AV        56 56 51 45 45 45 45 45 51 51 51 51 51 56 56 51 56 56 51 
AV        51 51 51 51 51 51 56 56 56 56 51 51 51 51 51 51 56 51 51 
AV        51 51 51 51 56 56 56 56 56 51 51 51 51 51 51 51 51 56 56 
AV        56 56 56 56 56 45 45 45 43 43 43 46 46 46 51 56 56 56 51 
AV        51 51 56 56 45 45 45 45 45 45 56 56 56 56 56 56 56 56 56 
AV        51 51 51 51 51 51 43 43 43 43 43 43 43 45 45 45 45 45 46 
AV        43 43 43 43 43 43 51 56 43 43 43 43 43 43 51 51 46 46 46 
AV        46 51 46 51 51 51 51 51 51 56 56 56 51 45 45 45 45 45 45 
AV        51 51 51 56 51 51 51 51 51 51 51 56 56 56 56 56 56 51 51 
AV        45 45 45 45 45 45 51 51 56 56 56 56 56 56 56 56 56 56 56 
AV        56 51 51 51 45 45 45 45 45 45 43 43 43 43 43 43 43 43 43 
AV        46 46 51 41 40 45 45 45 45 51 51 45 51 51 45 45 45 45 45 
AV        45 45 45 45 45 45 45 45 45 51 56 56 56 56 56 56 56 56 56 
AV        56 51 51 51 45 45 45 40 40 37 37 37 40 56 51 56 56 56 51 
AV        51 51 51 51 51 56 45 45 45 45 40 40 45 45 40 40 45 45 40 
AV        40 45 45 51 51 51 51 46 46 40 51 40 37 37 37 40 45 45 45 
AV        45 56 45 40 35 35 35 32 32 35 46 42 51 51 51 46 46 37 35 
AV        35 35 35 39 40 40 40 35 35 35 35 35 35 42 42 46 46 46 40 
AV        40 40 40 40 38 42 42 27 19 11 11 11 28 27 40 40 40 44 32 
AV        32 29 29 15 20 18 24 14 19 9 10 10 19 27 37 25 25 25 24 
AV        29 29 29 20 12 12 13 12 15 12 11 8 8 8 8 7 7 7 4 0 0 0 
AV        0 0 0 0 0 
SQ   
     GTGGGCAGAA AAGTTGACAT TCCTCTTCTG CATTTCCTGG ATTGAAAACA GAGCAAATGA
     CTGGCGCTTT GAAACCTTGA ATGTATTCTG CAAATACTGA GCATCAAGTT CACTTTCTTC
     CATTTCTATG CTTGTTTCCC GACTGTGGTT AACTTCATGT CCCAATGGAT ACTTAAAGCC
     TTCTGTGTCA TTTCTATTAT CTTTGGAACA ACCATGAATT AGTCCCTTGG GGTTTTCAAA
     TGCTGCACAC TGACTCACAC ATTTATTTGG TTCTGTTTTT GCCTTCCCTA GAGTGCTAAC
     TTCCAGTAAC GAGATACTTT CCTGAGTGCC ATAATCAGTA CCAGGTACCA GTGAAATACT
     GCTACTCTCT ACAGATCTTT CAGTTTGCAA AACCCTTTCT CCACTTAACA TGAGATCTTT
     GGGGTCTTCA GCATTATTAG ACACTTTAAC TGTTTCTAGT TTCTCTTCTT TTTCTTCTCT
     TGGAAGGCTA GGATTGACAA ATTCTTTAAG TTCACTGGTA TTTGAACACT TAGTAAAAGA
     ACCAGGTGCA TTTGTTAACT TCAGCTCTGG GAAAGTATCA CTGTCATGTC TTTTACTTGT
     CTGTTCATTT GGCACTGGCC GTCGCGCTTC ANNNNNNNN
//
TN   000256_11c
PR   2
QL   17
QR   617
WT   /home/gbottu/demo/exercise_snip/000906_11cR.scf
WL   -1
WR   -1
TG   MUTA - 50..50
TG        C->T Sensitivity= 7.86, Alignment=0.14, Width=0.86, Amplitude=643
TG   MUTA - 351..351
TG        T->C Sensitivity= 8.13, Alignment=0.07, Width=1.00, Amplitude=1105
TG   MUTA - 580..580
TG        C->T Sensitivity= 6.19, Alignment=0.29, Width=1.21, Amplitude=1338
TG   HETE - 585..585
TG        AG Ratio=0.87, Alignment=0.00, Amplitude1=0.18, Amplitude2=0.15


More information about the emboss-dev mailing list