[Bioperl-l] Deprecating
Stefan Kirov
skirov at utk.edu
Sat Oct 16 11:54:05 EDT 2004
Here is the meme file with strand data this time.
Enjoy :-),
Stefan
Brian Osborne wrote:
>Stefan,
>
>I took a look at AlignIO/meme.pm and fixed some obvious problems but I think
>it's possible that the file used in the tests, t/data/meme.dat, may not have
>all the information I need to debug the module (and I don't have meme
>myself). Question: is "strand" ever stipulated in the meme output file?
>There's no "+" or "-" indicated in meme.dat. This file is the result of an
>analysis of DNA, so I'm a bit surprised not to see some indication of strand
>(and the code, before I modified it, was using a regexp that attempted to
>capture strand information). Can you enlighten me here? Here's one of the
>relevant sections:
>
>
>Sequence name Start P-value Site
>------------- ----- --------- -----------------------
>--
>6603 1311 2.59e-15 GGCGCATTGA
>CAGAAAAATTGAATTCCCACCCCCC AATGAGGAGG
>83796 1284 2.59e-15 GGAGGATTGA
>CAGAAAAATTGAATTCCCACCCCCC AACGAGGAGG
>20218 938 6.34e-12 TTTTTGGTAA
>CCTTAAAATAAAATCCCCACCACCA CTTTTAAAAA
>10657 1685 8.70e-12 GGCCCGCGCG
>CAGACAAAGACATTCCACAGCTCCC GCCCCCTCCA
>
>Perhaps when one runs meme with DNA one can tell it to just do 1 strand,
>rather than both? This would explain the absence of strand information, but
>it means the regexp changes depending on this mode. Again, I don't have meme
>or use meme, so you're going to have to tell me, or send me output files
>with and without strand stipulation.
>
>Brian O.
>
>-----Original Message-----
>From: Stefan Kirov [mailto:skirov at utk.edu]
>Sent: Friday, October 15, 2004 6:33 PM
>To: Brian Osborne
>Cc: bioperl-l at bioperl.org
>Subject: Re: [Bioperl-l] Deprecating
>
>I wonder if anyone uses *Bio::AlignIO::meme*. Last time (year ago) it
>did not work for me and I have not seen anyone committing changes....
>Stefan
>
>Brian Osborne wrote:
>
>
>
>>Bioperl-l,
>>
>>Any objections to my adding Bio::Tools::RestrictionEnzyme to the DEPRECATED
>>file? It was replaced by Bio::Restriction a while ago.
>>
>>Along the same lines, is there anything else that should be deprecated?
>>Bio::Tools::GFF?
>>
>>Brian O.
>>
>>
>>
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at portal.open-bio.org
>>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>>
>
>--
>Stefan Kirov, Ph.D.
>University of Tennessee/Oak Ridge National Laboratory
>5700 bldg, PO BOX 2008 MS6164
>Oak Ridge TN 37831-6164
>USA
>tel +865 576 5120
>fax +865-576-5332
>e-mail: skirov at utk.edu
>sao at ornl.gov
>
>"And the wars go on with brainwashed pride
>For the love of God and our human rights
>And all these things are swept aside"
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>
--
Stefan Kirov, Ph.D.
University of Tennessee/Oak Ridge National Laboratory
5700 bldg, PO BOX 2008 MS6164
Oak Ridge TN 37831-6164
USA
tel +865 576 5120
fax +865-576-5332
e-mail: skirov at utk.edu
sao at ornl.gov
"And the wars go on with brainwashed pride
For the love of God and our human rights
And all these things are swept aside"
-------------- next part --------------
********************************************************************************
MEME - Motif discovery tool
********************************************************************************
MEME version 3.0 (Release date: 2002/04/02 00:11:59)
For further information on how to interpret these results or to get
a copy of the MEME software please access http://meme.sdsc.edu.
This file may be used as input to the MAST algorithm for searching
sequence databases for matches to groups of motifs. MAST is available
for interactive use and downloading at http://meme.sdsc.edu.
********************************************************************************
********************************************************************************
REFERENCE
********************************************************************************
If you use this program in your research, please cite:
Timothy L. Bailey and Charles Elkan,
"Fitting a mixture model by expectation maximization to discover
motifs in biopolymers", Proceedings of the Second International
Conference on Intelligent Systems for Molecular Biology, pp. 28-36,
AAAI Press, Menlo Park, California, 1994.
********************************************************************************
********************************************************************************
TRAINING SET
********************************************************************************
DATAFILE= test.fasta
ALPHABET= ACGT
Sequence name Weight Length Sequence name Weight Length
------------- ------ ------ ------------- ------ ------
68723 1.0000 2000 16939 1.0000 2001
20754 1.0000 2001 6707 1.0000 2000
20755 1.0000 2000 6700 1.0000 2002
20760 1.0000 2000 20761 1.0000 2000
20762 1.0000 2000
********************************************************************************
********************************************************************************
COMMAND LINE SUMMARY
********************************************************************************
This information can also be useful in the event you wish to report a
problem with the MEME software.
command: meme test.fasta -dna -nostatus -nmotifs 2 -minsites 8 -maxw 20 -revcomp
model: mod= zoops nmotifs= 2 evt= inf
object function= E-value of product of p-values
width: minw= 8 maxw= 20 minic= 0.00
width: wg= 11 ws= 1 endgaps= yes
nsites: minsites= 8 maxsites= 9 wnsites= 0.8
theta: prob= 1 spmap= uni spfuzz= 0.5
em: prior= dirichlet b= 0.01 maxiter= 50
distance= 1e-05
data: n= 18004 N= 9
strands: + -
sample: seed= 0 seqfrac= 1
Letter frequencies in dataset:
A 0.295 C 0.205 G 0.205 T 0.295
Background letter frequencies (from dataset with add-one prior applied):
A 0.295 C 0.205 G 0.205 T 0.295
********************************************************************************
********************************************************************************
MOTIF 1 width = 20 sites = 8 llr = 147 E-value = 1.3e-002
********************************************************************************
--------------------------------------------------------------------------------
Motif 1 Description
--------------------------------------------------------------------------------
Simplified A ::a1::931:6:348:1::1
pos.-specific C aa:::8:11841:331:139
probability G :::9::::63::84::3:8:
matrix T ::::a3161::9:::969::
bits 2.3 **
2.1 **
1.8 *** *
1.6 ***** *
Information 1.4 ****** * ** * ***
content 1.1 ******* * ** ** ***
(26.5 bits) 0.9 ******* **** ** ***
0.7 ******* ***** ******
0.5 ********************
0.2 ********************
0.0 --------------------
Multilevel CCAGTCATGCATGAATTTGC
consensus T A GC AGC G C
sequence C
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Motif 1 sites sorted by position p-value
--------------------------------------------------------------------------------
Sequence name Strand Start P-value Site
------------- ------ ----- --------- --------------------
20761 + 1879 6.50e-13 TCTGATTAAG CCAGTCATGCATGGATTTGC ATTTTGGTTG
20760 + 1875 6.50e-13 CCCAGTCACG CCAGTCATGCATGGATTTGC ATTTTGATTG
6700 + 1100 2.27e-10 CCTGCTCATG CCAGTCATGGATAAATTTGC ATCTGGCTTA
20755 + 1478 5.08e-10 CCCTGTCAGG CCAGTTATGGATGAATGTGC ACTTAANNNN
6707 + 1431 6.11e-09 TCACACAGAT CCAGTCAATCCTGCCTGTCC ATCTCAATGA
20762 + 1878 1.89e-08 CCTGGTTAGG CCAGTTAAACACAGATTTGC ATTTTGGTTA
16939 - 914 2.01e-08 ACTTTTCCTT CCAATCATGCCTGCCCTTGA ACCCTATTGG
20754 + 1175 6.73e-08 GCTCACCTTG CCAGTCTCCCCTGAATACCC TACATGCCCT
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Motif 1 block diagrams
--------------------------------------------------------------------------------
SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM
------------- ---------------- -------------
20761 6.5e-13 1878_[+1]_102
20760 6.5e-13 1874_[+1]_106
6700 2.3e-10 1099_[+1]_883
20755 5.1e-10 1477_[+1]_503
6707 6.1e-09 1430_[+1]_550
20762 1.9e-08 1877_[+1]_103
16939 2e-08 913_[-1]_1068
20754 6.7e-08 1174_[+1]_807
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Motif 1 in BLOCKS format
--------------------------------------------------------------------------------
BL MOTIF 1 width=20 seqs=8
20761 ( 1879) CCAGTCATGCATGGATTTGC 1
20760 ( 1875) CCAGTCATGCATGGATTTGC 1
6700 ( 1100) CCAGTCATGGATAAATTTGC 1
20755 ( 1478) CCAGTTATGGATGAATGTGC 1
6707 ( 1431) CCAGTCAATCCTGCCTGTCC 1
20762 ( 1878) CCAGTTAAACACAGATTTGC 1
16939 ( 914) CCAATCATGCCTGCCCTTGA 1
20754 ( 1175) CCAGTCTCCCCTGAATACCC 1
//
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Motif 1 position-specific scoring matrix
--------------------------------------------------------------------------------
log-odds matrix: alength= 4 w= 20 n= 17833 bayes= 11.1216 E= 1.3e-002
-965 229 -965 -965
-965 229 -965 -965
176 -965 -965 -965
-124 -965 210 -965
-965 -965 -965 176
-965 187 -965 -24
157 -965 -965 -124
-24 -71 -965 108
-124 -71 161 -124
-965 187 29 -965
108 87 -965 -965
-965 -71 -965 157
-24 -965 187 -965
34 29 87 -965
134 29 -965 -965
-965 -71 -965 157
-124 -965 29 108
-965 -71 -965 157
-965 29 187 -965
-124 210 -965 -965
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Motif 1 position-specific probability matrix
--------------------------------------------------------------------------------
letter-probability matrix: alength= 4 w= 20 n= 17833 E= 1.3e-002
0.000369 0.999007 0.000255 0.000369
0.000369 0.999007 0.000255 0.000369
0.999120 0.000255 0.000255 0.000369
0.125213 0.000255 0.874163 0.000369
0.000369 0.000255 0.000255 0.999120
0.000369 0.749319 0.000255 0.250057
0.874276 0.000255 0.000255 0.125213
0.250057 0.125099 0.000255 0.624589
0.125213 0.125099 0.624475 0.125213
0.000369 0.749319 0.249943 0.000369
0.624589 0.374787 0.000255 0.000369
0.000369 0.125099 0.000255 0.874276
0.250057 0.000255 0.749319 0.000369
0.374901 0.249943 0.374787 0.000369
0.749432 0.249943 0.000255 0.000369
0.000369 0.125099 0.000255 0.874276
0.125213 0.000255 0.249943 0.624589
0.000369 0.125099 0.000255 0.874276
0.000369 0.249943 0.749319 0.000369
0.125213 0.874163 0.000255 0.000369
--------------------------------------------------------------------------------
Time 75.70 secs.
********************************************************************************
********************************************************************************
MOTIF 2 width = 15 sites = 8 llr = 117 E-value = 1.2e+003
********************************************************************************
--------------------------------------------------------------------------------
Motif 2 Description
--------------------------------------------------------------------------------
Simplified A :1a39:::::18:::
pos.-specific C ::::1:::8a:::::
probability G 96:3::1:::::4::
matrix T 13:5:a9a3:936aa
bits 2.3 *
2.1 *
1.8 * * * * **
1.6 * * * * * **
Information 1.4 * * ****** **
content 1.1 * * ******* **
(21.0 bits) 0.9 * * ***********
0.7 *** ***********
0.5 ***************
0.2 ***************
0.0 ---------------
Multilevel GGATATTTCCTATTT
consensus T A T TG
sequence G
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Motif 2 sites sorted by position p-value
--------------------------------------------------------------------------------
Sequence name Strand Start P-value Site
------------- ------ ----- --------- ---------------
20762 + 1845 2.62e-09 TCCAGGAACA GGATATTTCCTATTT TTGAGAGTCC
6700 + 1068 2.62e-09 TTTCAGAACA GGATATTTCCTATTT TGAGTATCCT
20755 + 1445 2.84e-08 GCCAAGGGTG GGATATTTTCTATTT TGTAGAGTCC
20754 - 664 5.62e-08 TTTCTTAGAA GGAAATTTCCTTGTT CTCTTTCTAT
20761 + 670 1.06e-07 GAAGAAAAAG GAAGATTTCCTAGTT AACAATTCAA
68723 - 1925 5.26e-07 TTGCTTTCTT TGAGATGTCCTAGTT CACTCCTAAA
20760 - 651 5.56e-07 TTTAAACTTG GTAAATTTTCTTTTT CTTCACATTT
16939 - 1616 6.78e-07 TAGTTCAGTT GTATCTTTCCAATTT TGATGTTTGG
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Motif 2 block diagrams
--------------------------------------------------------------------------------
SEQUENCE NAME POSITION P-VALUE MOTIF DIAGRAM
------------- ---------------- -------------
20762 2.6e-09 1844_[+2]_141
6700 2.6e-09 1067_[+2]_920
20755 2.8e-08 1444_[+2]_541
20754 5.6e-08 663_[-2]_1323
20761 1.1e-07 669_[+2]_1316
68723 5.3e-07 1924_[-2]_61
20760 5.6e-07 650_[-2]_1335
16939 6.8e-07 1615_[-2]_371
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Motif 2 in BLOCKS format
--------------------------------------------------------------------------------
BL MOTIF 2 width=15 seqs=8
20762 ( 1845) GGATATTTCCTATTT 1
6700 ( 1068) GGATATTTCCTATTT 1
20755 ( 1445) GGATATTTTCTATTT 1
20754 ( 664) GGAAATTTCCTTGTT 1
20761 ( 670) GAAGATTTCCTAGTT 1
68723 ( 1925) TGAGATGTCCTAGTT 1
20760 ( 651) GTAAATTTTCTTTTT 1
16939 ( 1616) GTATCTTTCCAATTT 1
//
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Motif 2 position-specific scoring matrix
--------------------------------------------------------------------------------
log-odds matrix: alength= 4 w= 15 n= 17878 bayes= 11.1253 E= 1.2e+003
-965 -965 210 -124
-124 -965 161 -24
176 -965 -965 -965
-24 -965 29 76
157 -71 -965 -965
-965 -965 -965 176
-965 -965 -71 157
-965 -965 -965 176
-965 187 -965 -24
-965 229 -965 -965
-124 -965 -965 157
134 -965 -965 -24
-965 -965 87 108
-965 -965 -965 176
-965 -965 -965 176
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
Motif 2 position-specific probability matrix
--------------------------------------------------------------------------------
letter-probability matrix: alength= 4 w= 15 n= 17878 E= 1.2e+003
0.000369 0.000255 0.874163 0.125213
0.125213 0.000255 0.624475 0.250057
0.999120 0.000255 0.000255 0.000369
0.250057 0.000255 0.249943 0.499745
0.874276 0.125099 0.000255 0.000369
0.000369 0.000255 0.000255 0.999120
0.000369 0.000255 0.125099 0.874276
0.000369 0.000255 0.000255 0.999120
0.000369 0.749319 0.000255 0.250057
0.000369 0.999007 0.000255 0.000369
0.125213 0.000255 0.000255 0.874276
0.749432 0.000255 0.000255 0.250057
0.000369 0.000255 0.374787 0.624589
0.000369 0.000255 0.000255 0.999120
0.000369 0.000255 0.000255 0.999120
--------------------------------------------------------------------------------
Time 152.20 secs.
********************************************************************************
********************************************************************************
SUMMARY OF MOTIFS
********************************************************************************
--------------------------------------------------------------------------------
Combined block diagrams: non-overlapping sites with p-value < 0.0001
--------------------------------------------------------------------------------
SEQUENCE NAME COMBINED P-VALUE MOTIF DIAGRAM
------------- ---------------- -------------
68723 2.83e-04 473_[-1(8.43e-06)]_[+1(1.38e-05)]_[+1(1.38e-05)]_35_[-1(3.15e-06)]_[+1(1.38e-05)]_9_[-1(7.89e-05)]_63_[-1(2.97e-06)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_26_[+2(4.37e-06)]_46_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_6_[-1(7.97e-06)]_24_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_1_[-2(5.92e-05)]_279_[-1(8.43e-06)]_[+1(1.38e-05)]_[+2(5.60e-05)]_157_[-2(5.26e-07)]_61
16939 3.50e-06 913_[-1(2.01e-08)]_682_[-2(6.78e-07)]_205_[+1(5.80e-05)]_146
20754 1.05e-06 39_[-2(1.78e-05)]_4_[+1(4.78e-06)]_564_[-2(7.42e-05)]_6_[-2(5.62e-08)]_146_[+2(3.98e-05)]_335_[+1(6.73e-08)]_93_[+1(5.54e-05)]_694
6707 2.26e-05 173_[+1(4.46e-05)]_655_[-2(5.60e-05)]_3_[+1(3.06e-06)]_12_[+1(4.78e-06)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.93e-05)]_181_[-1(3.26e-05)]_251_[+1(6.11e-09)]_329_[+1(6.34e-05)]_201
20755 5.26e-09 160_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_3_[+1(6.41e-06)]_219_[-1(4.80e-05)]_962_[+2(2.84e-08)]_18_[+1(5.08e-10)]_6_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_14_[+1(7.97e-06)]_83
6700 2.48e-10 48_[+1(7.97e-06)]_267_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_129_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_3_[+1(4.41e-06)]_280_[+2(2.62e-09)]_17_[+1(2.27e-10)]_101_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+2(7.29e-05)]_[+1(6.41e-06)]_604_[-1(8.43e-06)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_3
20760 1.53e-10 259_[-1(6.20e-05)]_339_[-2(3.17e-05)]_17_[-2(5.56e-07)]_436_[-2(4.06e-05)]_7_[+1(2.95e-07)]_6_[-1(1.12e-05)]_705_[+1(6.50e-13)]_106
20761 3.10e-11 397_[-2(1.10e-06)]_257_[+2(1.06e-07)]_382_[-2(5.60e-05)]_[+2(7.29e-05)]_[-1(3.24e-06)]_[+1(1.38e-05)]_[+1(1.38e-05)]_722_[+1(6.50e-13)]_102
20762 1.72e-08 134_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_92_[+2(9.22e-05)]_517_[+1(7.97e-06)]_7_[-1(4.78e-06)]_439_[+2(2.62e-09)]_18_[+1(1.89e-08)]_103
--------------------------------------------------------------------------------
********************************************************************************
********************************************************************************
Stopped because nmotifs = 2 reached.
********************************************************************************
CPU: crick
********************************************************************************
More information about the Bioperl-l
mailing list