[Bioperl-l] Deprecating

Stefan Kirov skirov at utk.edu
Sat Oct 16 11:54:05 EDT 2004


Here is the meme file with strand data this time.
Enjoy :-),
Stefan

Brian Osborne wrote:

>Stefan,
>
>I took a look at AlignIO/meme.pm and fixed some obvious problems but I think
>it's possible that the file used in the tests, t/data/meme.dat, may not have
>all the information I need to debug the module (and I don't have meme
>myself). Question: is "strand" ever stipulated in the meme output file?
>There's no "+" or "-" indicated in meme.dat. This file is the result of an
>analysis of DNA, so I'm a bit surprised not to see some indication of strand
>(and the code, before I modified it, was using a regexp that attempted to
>capture strand information). Can you enlighten me here? Here's one of the
>relevant sections:
>
>
>Sequence name             Start   P-value                      Site
>-------------             ----- ---------            -----------------------
>--
>6603                       1311  2.59e-15 GGCGCATTGA
>CAGAAAAATTGAATTCCCACCCCCC AATGAGGAGG
>83796                      1284  2.59e-15 GGAGGATTGA
>CAGAAAAATTGAATTCCCACCCCCC AACGAGGAGG
>20218                       938  6.34e-12 TTTTTGGTAA
>CCTTAAAATAAAATCCCCACCACCA CTTTTAAAAA
>10657                      1685  8.70e-12 GGCCCGCGCG
>CAGACAAAGACATTCCACAGCTCCC GCCCCCTCCA
>
>Perhaps when one runs meme with DNA one can tell it to just do 1 strand,
>rather than both? This would explain the absence of strand information, but
>it means the regexp changes depending on this mode. Again, I don't have meme
>or use meme, so you're going to have to tell me, or send me output files
>with and without strand stipulation.
>
>Brian O.
>
>-----Original Message-----
>From: Stefan Kirov [mailto:skirov at utk.edu]
>Sent: Friday, October 15, 2004 6:33 PM
>To: Brian Osborne
>Cc: bioperl-l at bioperl.org
>Subject: Re: [Bioperl-l] Deprecating
>
>I wonder if anyone uses *Bio::AlignIO::meme*. Last time (year ago) it
>did not work for me and I have not seen anyone committing changes....
>Stefan
>
>Brian Osborne wrote:
>
>  
>
>>Bioperl-l,
>>
>>Any objections to my adding Bio::Tools::RestrictionEnzyme to the DEPRECATED
>>file? It was replaced by Bio::Restriction a while ago.
>>
>>Along the same lines, is there anything else that should be deprecated?
>>Bio::Tools::GFF?
>>
>>Brian O.
>>
>>
>>
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at portal.open-bio.org
>>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>    
>>
>
>--
>Stefan Kirov, Ph.D.
>University of Tennessee/Oak Ridge National Laboratory
>5700 bldg, PO BOX 2008 MS6164
>Oak Ridge TN 37831-6164
>USA
>tel +865 576 5120
>fax +865-576-5332
>e-mail: skirov at utk.edu
>sao at ornl.gov
>
>"And the wars go on with brainwashed pride
>For the love of God and our human rights
>And all these things are swept aside"
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>  
>

-- 
Stefan Kirov, Ph.D.
University of Tennessee/Oak Ridge National Laboratory
5700 bldg, PO BOX 2008 MS6164
Oak Ridge TN 37831-6164
USA
tel +865 576 5120
fax +865-576-5332
e-mail: skirov at utk.edu
sao at ornl.gov

"And the wars go on with brainwashed pride
For the love of God and our human rights
And all these things are swept aside"

-------------- next part --------------
********************************************************************************
MEME - Motif discovery tool
********************************************************************************
MEME version 3.0 (Release date: 2002/04/02 00:11:59)

For further information on how to interpret these results or to get
a copy of the MEME software please access http://meme.sdsc.edu.

This file may be used as input to the MAST algorithm for searching
sequence databases for matches to groups of motifs.  MAST is available
for interactive use and downloading at http://meme.sdsc.edu.
********************************************************************************


********************************************************************************
REFERENCE
********************************************************************************
If you use this program in your research, please cite:

Timothy L. Bailey and Charles Elkan,
"Fitting a mixture model by expectation maximization to discover
motifs in biopolymers", Proceedings of the Second International
Conference on Intelligent Systems for Molecular Biology, pp. 28-36,
AAAI Press, Menlo Park, California, 1994.
********************************************************************************


********************************************************************************
TRAINING SET
********************************************************************************
DATAFILE= test.fasta
ALPHABET= ACGT
Sequence name            Weight Length  Sequence name            Weight Length  
-------------            ------ ------  -------------            ------ ------  
68723                    1.0000   2000  16939                    1.0000   2001  
20754                    1.0000   2001  6707                     1.0000   2000  
20755                    1.0000   2000  6700                     1.0000   2002  
20760                    1.0000   2000  20761                    1.0000   2000  
20762                    1.0000   2000  
********************************************************************************

********************************************************************************
COMMAND LINE SUMMARY
********************************************************************************
This information can also be useful in the event you wish to report a
problem with the MEME software.

command: meme test.fasta -dna -nostatus -nmotifs 2 -minsites 8 -maxw 20 -revcomp 

model:  mod=         zoops    nmotifs=         2    evt=           inf
object function=  E-value of product of p-values
width:  minw=            8    maxw=           20    minic=        0.00
width:  wg=             11    ws=              1    endgaps=       yes
nsites: minsites=        8    maxsites=        9    wnsites=       0.8
theta:  prob=            1    spmap=         uni    spfuzz=        0.5
em:     prior=   dirichlet    b=            0.01    maxiter=        50
        distance=    1e-05
data:   n=           18004    N=               9
strands: + -
sample: seed=            0    seqfrac=         1
Letter frequencies in dataset:
A 0.295 C 0.205 G 0.205 T 0.295 
Background letter frequencies (from dataset with add-one prior applied):
A 0.295 C 0.205 G 0.205 T 0.295 
********************************************************************************


********************************************************************************
MOTIF  1	width =   20   sites =   8   llr = 147   E-value = 1.3e-002
********************************************************************************
--------------------------------------------------------------------------------
	Motif 1 Description
--------------------------------------------------------------------------------
Simplified        A  ::a1::931:6:348:1::1
pos.-specific     C  aa:::8:11841:331:139
probability       G  :::9::::63::84::3:8:
matrix            T  ::::a3161::9:::969::

         bits    2.3 **                  
                 2.1 **                  
                 1.8 *** *               
                 1.6 *****              *
Information      1.4 ******   * **  * ***
content          1.1 *******  * ** ** ***
(26.5 bits)      0.9 *******  **** ** ***
                 0.7 ******* ***** ******
                 0.5 ********************
                 0.2 ********************
                 0.0 --------------------

Multilevel           CCAGTCATGCATGAATTTGC
consensus                 T A GC AGC G C 
sequence                          C      
                                         
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
	Motif 1 sites sorted by position p-value
--------------------------------------------------------------------------------
Sequence name            Strand  Start   P-value                    Site      
-------------            ------  ----- ---------            --------------------
20761                        +   1879  6.50e-13 TCTGATTAAG CCAGTCATGCATGGATTTGC ATTTTGGTTG
20760                        +   1875  6.50e-13 CCCAGTCACG CCAGTCATGCATGGATTTGC ATTTTGATTG
6700                         +   1100  2.27e-10 CCTGCTCATG CCAGTCATGGATAAATTTGC ATCTGGCTTA
20755                        +   1478  5.08e-10 CCCTGTCAGG CCAGTTATGGATGAATGTGC ACTTAANNNN
6707                         +   1431  6.11e-09 TCACACAGAT CCAGTCAATCCTGCCTGTCC ATCTCAATGA
20762                        +   1878  1.89e-08 CCTGGTTAGG CCAGTTAAACACAGATTTGC ATTTTGGTTA
16939                        -    914  2.01e-08 ACTTTTCCTT CCAATCATGCCTGCCCTTGA ACCCTATTGG
20754                        +   1175  6.73e-08 GCTCACCTTG CCAGTCTCCCCTGAATACCC TACATGCCCT
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
	Motif 1 block diagrams
--------------------------------------------------------------------------------
SEQUENCE NAME            POSITION P-VALUE  MOTIF DIAGRAM
-------------            ----------------  -------------
20761                             6.5e-13  1878_[+1]_102
20760                             6.5e-13  1874_[+1]_106
6700                              2.3e-10  1099_[+1]_883
20755                             5.1e-10  1477_[+1]_503
6707                              6.1e-09  1430_[+1]_550
20762                             1.9e-08  1877_[+1]_103
16939                               2e-08  913_[-1]_1068
20754                             6.7e-08  1174_[+1]_807
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
	Motif 1 in BLOCKS format
--------------------------------------------------------------------------------
BL   MOTIF 1 width=20 seqs=8
20761                    ( 1879) CCAGTCATGCATGGATTTGC  1 
20760                    ( 1875) CCAGTCATGCATGGATTTGC  1 
6700                     ( 1100) CCAGTCATGGATAAATTTGC  1 
20755                    ( 1478) CCAGTTATGGATGAATGTGC  1 
6707                     ( 1431) CCAGTCAATCCTGCCTGTCC  1 
20762                    ( 1878) CCAGTTAAACACAGATTTGC  1 
16939                    (  914) CCAATCATGCCTGCCCTTGA  1 
20754                    ( 1175) CCAGTCTCCCCTGAATACCC  1 
//

--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
	Motif 1 position-specific scoring matrix
--------------------------------------------------------------------------------
log-odds matrix: alength= 4 w= 20 n= 17833 bayes= 11.1216 E= 1.3e-002 
  -965    229   -965   -965 
  -965    229   -965   -965 
   176   -965   -965   -965 
  -124   -965    210   -965 
  -965   -965   -965    176 
  -965    187   -965    -24 
   157   -965   -965   -124 
   -24    -71   -965    108 
  -124    -71    161   -124 
  -965    187     29   -965 
   108     87   -965   -965 
  -965    -71   -965    157 
   -24   -965    187   -965 
    34     29     87   -965 
   134     29   -965   -965 
  -965    -71   -965    157 
  -124   -965     29    108 
  -965    -71   -965    157 
  -965     29    187   -965 
  -124    210   -965   -965 
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
	Motif 1 position-specific probability matrix
--------------------------------------------------------------------------------
letter-probability matrix: alength= 4 w= 20 n= 17833 E= 1.3e-002 
 0.000369  0.999007  0.000255  0.000369 
 0.000369  0.999007  0.000255  0.000369 
 0.999120  0.000255  0.000255  0.000369 
 0.125213  0.000255  0.874163  0.000369 
 0.000369  0.000255  0.000255  0.999120 
 0.000369  0.749319  0.000255  0.250057 
 0.874276  0.000255  0.000255  0.125213 
 0.250057  0.125099  0.000255  0.624589 
 0.125213  0.125099  0.624475  0.125213 
 0.000369  0.749319  0.249943  0.000369 
 0.624589  0.374787  0.000255  0.000369 
 0.000369  0.125099  0.000255  0.874276 
 0.250057  0.000255  0.749319  0.000369 
 0.374901  0.249943  0.374787  0.000369 
 0.749432  0.249943  0.000255  0.000369 
 0.000369  0.125099  0.000255  0.874276 
 0.125213  0.000255  0.249943  0.624589 
 0.000369  0.125099  0.000255  0.874276 
 0.000369  0.249943  0.749319  0.000369 
 0.125213  0.874163  0.000255  0.000369 
--------------------------------------------------------------------------------





Time 75.70 secs.

********************************************************************************


********************************************************************************
MOTIF  2	width =   15   sites =   8   llr = 117   E-value = 1.2e+003
********************************************************************************
--------------------------------------------------------------------------------
	Motif 2 Description
--------------------------------------------------------------------------------
Simplified        A  :1a39:::::18:::
pos.-specific     C  ::::1:::8a:::::
probability       G  96:3::1:::::4::
matrix            T  13:5:a9a3:936aa

         bits    2.3          *     
                 2.1          *     
                 1.8   *  * * *   **
                 1.6 * *  * * *   **
Information      1.4 * * ******   **
content          1.1 * * *******  **
(21.0 bits)      0.9 * * ***********
                 0.7 *** ***********
                 0.5 ***************
                 0.2 ***************
                 0.0 ---------------

Multilevel           GGATATTTCCTATTT
consensus             T A    T  TG  
sequence                G           
                                    
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
	Motif 2 sites sorted by position p-value
--------------------------------------------------------------------------------
Sequence name            Strand  Start   P-value                 Site    
-------------            ------  ----- ---------            ---------------
20762                        +   1845  2.62e-09 TCCAGGAACA GGATATTTCCTATTT TTGAGAGTCC
6700                         +   1068  2.62e-09 TTTCAGAACA GGATATTTCCTATTT TGAGTATCCT
20755                        +   1445  2.84e-08 GCCAAGGGTG GGATATTTTCTATTT TGTAGAGTCC
20754                        -    664  5.62e-08 TTTCTTAGAA GGAAATTTCCTTGTT CTCTTTCTAT
20761                        +    670  1.06e-07 GAAGAAAAAG GAAGATTTCCTAGTT AACAATTCAA
68723                        -   1925  5.26e-07 TTGCTTTCTT TGAGATGTCCTAGTT CACTCCTAAA
20760                        -    651  5.56e-07 TTTAAACTTG GTAAATTTTCTTTTT CTTCACATTT
16939                        -   1616  6.78e-07 TAGTTCAGTT GTATCTTTCCAATTT TGATGTTTGG
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
	Motif 2 block diagrams
--------------------------------------------------------------------------------
SEQUENCE NAME            POSITION P-VALUE  MOTIF DIAGRAM
-------------            ----------------  -------------
20762                             2.6e-09  1844_[+2]_141
6700                              2.6e-09  1067_[+2]_920
20755                             2.8e-08  1444_[+2]_541
20754                             5.6e-08  663_[-2]_1323
20761                             1.1e-07  669_[+2]_1316
68723                             5.3e-07  1924_[-2]_61
20760                             5.6e-07  650_[-2]_1335
16939                             6.8e-07  1615_[-2]_371
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
	Motif 2 in BLOCKS format
--------------------------------------------------------------------------------
BL   MOTIF 2 width=15 seqs=8
20762                    ( 1845) GGATATTTCCTATTT  1 
6700                     ( 1068) GGATATTTCCTATTT  1 
20755                    ( 1445) GGATATTTTCTATTT  1 
20754                    (  664) GGAAATTTCCTTGTT  1 
20761                    (  670) GAAGATTTCCTAGTT  1 
68723                    ( 1925) TGAGATGTCCTAGTT  1 
20760                    (  651) GTAAATTTTCTTTTT  1 
16939                    ( 1616) GTATCTTTCCAATTT  1 
//

--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
	Motif 2 position-specific scoring matrix
--------------------------------------------------------------------------------
log-odds matrix: alength= 4 w= 15 n= 17878 bayes= 11.1253 E= 1.2e+003 
  -965   -965    210   -124 
  -124   -965    161    -24 
   176   -965   -965   -965 
   -24   -965     29     76 
   157    -71   -965   -965 
  -965   -965   -965    176 
  -965   -965    -71    157 
  -965   -965   -965    176 
  -965    187   -965    -24 
  -965    229   -965   -965 
  -124   -965   -965    157 
   134   -965   -965    -24 
  -965   -965     87    108 
  -965   -965   -965    176 
  -965   -965   -965    176 
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
	Motif 2 position-specific probability matrix
--------------------------------------------------------------------------------
letter-probability matrix: alength= 4 w= 15 n= 17878 E= 1.2e+003 
 0.000369  0.000255  0.874163  0.125213 
 0.125213  0.000255  0.624475  0.250057 
 0.999120  0.000255  0.000255  0.000369 
 0.250057  0.000255  0.249943  0.499745 
 0.874276  0.125099  0.000255  0.000369 
 0.000369  0.000255  0.000255  0.999120 
 0.000369  0.000255  0.125099  0.874276 
 0.000369  0.000255  0.000255  0.999120 
 0.000369  0.749319  0.000255  0.250057 
 0.000369  0.999007  0.000255  0.000369 
 0.125213  0.000255  0.000255  0.874276 
 0.749432  0.000255  0.000255  0.250057 
 0.000369  0.000255  0.374787  0.624589 
 0.000369  0.000255  0.000255  0.999120 
 0.000369  0.000255  0.000255  0.999120 
--------------------------------------------------------------------------------





Time 152.20 secs.

********************************************************************************


********************************************************************************
SUMMARY OF MOTIFS
********************************************************************************

--------------------------------------------------------------------------------
	Combined block diagrams: non-overlapping sites with p-value < 0.0001
--------------------------------------------------------------------------------
SEQUENCE NAME            COMBINED P-VALUE  MOTIF DIAGRAM
-------------            ----------------  -------------
68723                            2.83e-04  473_[-1(8.43e-06)]_[+1(1.38e-05)]_[+1(1.38e-05)]_35_[-1(3.15e-06)]_[+1(1.38e-05)]_9_[-1(7.89e-05)]_63_[-1(2.97e-06)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_26_[+2(4.37e-06)]_46_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_6_[-1(7.97e-06)]_24_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_1_[-2(5.92e-05)]_279_[-1(8.43e-06)]_[+1(1.38e-05)]_[+2(5.60e-05)]_157_[-2(5.26e-07)]_61
16939                            3.50e-06  913_[-1(2.01e-08)]_682_[-2(6.78e-07)]_205_[+1(5.80e-05)]_146
20754                            1.05e-06  39_[-2(1.78e-05)]_4_[+1(4.78e-06)]_564_[-2(7.42e-05)]_6_[-2(5.62e-08)]_146_[+2(3.98e-05)]_335_[+1(6.73e-08)]_93_[+1(5.54e-05)]_694
6707                             2.26e-05  173_[+1(4.46e-05)]_655_[-2(5.60e-05)]_3_[+1(3.06e-06)]_12_[+1(4.78e-06)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.93e-05)]_181_[-1(3.26e-05)]_251_[+1(6.11e-09)]_329_[+1(6.34e-05)]_201
20755                            5.26e-09  160_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_3_[+1(6.41e-06)]_219_[-1(4.80e-05)]_962_[+2(2.84e-08)]_18_[+1(5.08e-10)]_6_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_14_[+1(7.97e-06)]_83
6700                             2.48e-10  48_[+1(7.97e-06)]_267_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_129_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_3_[+1(4.41e-06)]_280_[+2(2.62e-09)]_17_[+1(2.27e-10)]_101_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+2(7.29e-05)]_[+1(6.41e-06)]_604_[-1(8.43e-06)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_3
20760                            1.53e-10  259_[-1(6.20e-05)]_339_[-2(3.17e-05)]_17_[-2(5.56e-07)]_436_[-2(4.06e-05)]_7_[+1(2.95e-07)]_6_[-1(1.12e-05)]_705_[+1(6.50e-13)]_106
20761                            3.10e-11  397_[-2(1.10e-06)]_257_[+2(1.06e-07)]_382_[-2(5.60e-05)]_[+2(7.29e-05)]_[-1(3.24e-06)]_[+1(1.38e-05)]_[+1(1.38e-05)]_722_[+1(6.50e-13)]_102
20762                            1.72e-08  134_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_[+1(1.38e-05)]_92_[+2(9.22e-05)]_517_[+1(7.97e-06)]_7_[-1(4.78e-06)]_439_[+2(2.62e-09)]_18_[+1(1.89e-08)]_103
--------------------------------------------------------------------------------

********************************************************************************


********************************************************************************
Stopped because nmotifs = 2 reached.
********************************************************************************

CPU: crick

********************************************************************************


More information about the Bioperl-l mailing list