[Bioperl-l] matching miRNAs to one or a lot of mRNAs

Stefan Kirov skirov at utk.edu
Sun Sep 28 14:12:54 EDT 2003


Just a small clarification here guys- BioPerl is not a collection of 
tools, though there are some. It provides you with the means to write 
your own tools, or integrate different tools to fit your needs.
 Pepi, have you looked at sim4? Maybe it  also could solve your problem. 
And you don't you do the masking on the fly, instead of creating a 
datbase- it might be slower(not sure about that, since you'll skip some 
IO ops), but your disk won't be full?
Stefan

Peter Stoilov wrote:

>Hi,
>
>they indeed mixed up the transcripts, but to me it looks like a honest 
>mistake. There seems to be another unrelated transcript with the same name 
>(HES1). The accession for this transcript is NM_004649.  All of the 
>experiments in the paper with the exception of the ELISA are done on the 
>wrong gene (ugly). The funny stuff doesn't end here. Using FASTA search I was 
>able to find 1 mi-RNA  (hsa-miR-221) that matches the transcript (NM_004649) 
>at exactly the same spot much better than hsa-miR-23(b).  
>
>hsa-miR-221 vs   Homo sapiens chromosome 21 open reading frame 33 (C21orf33), 
>mRNA
>Matches 19 of 23
>23      CUUUGGGUCGUCUGUUACAU-CG   2
>        :.::...: ..:: :.:: : :.
>873     GGAACUCACUGGAAAGUG-ACGC   894
>
>
>
>As for the real HES1 hsa-miR-205 and hsa-miR-221 are much better compared to 
>hsa-miR-23b.
>
>hsa-miR-205 vs   Homo sapiens hairy and enhancer of split 1, (Drosophila) 
>(HES1), mRNA
>Matches 18 of 22
>21      UCUGAGGCCACCUU-ACUUCCU   1
>        ::..  .:: :::: :::.::.
>1062    AGGCCGUGGCGGAACUGAGGGG   1083
>
>hsa-miR-221 vs   Homo sapiens hairy and enhancer of split 1, (Drosophila) 
>(HES1), mRNA
>Matches 21 of 23
>23      CUUUGG-GUCGUC-UG-UUACAUCGA   1
>        ::.... ..:..: :. .: : .:.:
>1061    GAGGCCGUGGCGGAACUGAGGGGGCU   1086
>
>
> I I'll write to the autors to see what they think about this. 
>
>
>Now about searching for mi-RNA targets. The smolest word size that I can use 
>in BLAST is 7 for nucleic acid (thanks for the WU-BLAST idea!). So I had to 
>go with FASTA. Now FASTA reports only one hit (should I say HSP?) per 
>sequence. The way I go arround this is to generate multiple sequences for 
>each transcript in wich the transcript except for 35 nt is masked with Ns. 
>The unmasked regions are tiled with 5nt step (30nt overlap).  The problem 
>with this is that the database size gets completely out of hand and will not 
>fit my hard drive;). Searching the database takes forever. But when I do it 
>for individual transcripts it works pretty well.  
>
>Peter
>
>On Saturday 27 September 2003 03:45, Ian Korf wrote:
>  
>
>>The human HES1 8400709 is not the sequence from the paper I don't
>>think. If you align the sequence in figure 1a against 8400709, you'll
>>find they don't match. There are other HES1 sequences in GenBank
>>though, for example, 1655593, that contain the sequence in the figure.
>>But if you try aligning the miRNA to 1655593 with NCBI-BLAST, you won't
>>find anything.
>>
>>If you do a S-W alignment (match +1, mismatch -1, gap -2) of the miRNA
>>complement against 1655593 you get the following, which is the same
>>alignment reported in the paper.
>>
>>Stats: score=12
>>Alignment: Q:855..874 S:1..21 17/3 1,0
>>Q: TGGAACTCACTGG-AAAGTGA
>>
>>S: TGGAAATCCCTGGAAATGTGA
>>
>>You'll note that the largest ungapped alilgnment is 5nt. The authors
>>did not say they used BLAST, only that they searched GenBank. 5nt is
>>too short for NCBI-BLAST, which has a minimum word size of 7. WU-BLAST
>>has no limit of word size, and you can find the alignment with
>>WU-BLAST. Same scoring system as above used here but note that E2 had
>>to be raised to at least 11 or the alignment would get pruned before
>>subjected to gapped statistics. Here it is:
>>
>>  Score = 12 (17.3 bits), Expect = 0.037, P = 0.037
>>  Identities = 17/21 (80%), Positives = 17/21 (80%), Strand = Plus / Plus
>>
>>Query:   855 TGGAACTCACTGGAAA-GTGA 874
>>
>>Sbjct:     1 TGGAAATCCCTGGCAATGTGA 21
>>
>>If you make a habit of such searches, don't be surprised if you run in
>>to a lot of false-positives. I think you might want to use additional
>>criteria such as overlapping the stop or located in the 3'UTR. I'm not
>>aware of any software specifically designed for such searches, but
>>perhaps the authors of the paper have one. The paper was very brief and
>>had no description of the bioinformatics in the methods section (if I
>>was one of the referees, I would have found this unacceptable). I
>>suggest you contact the authors and find out specifically what they did.
>>
>>-Ian
>>
>>On Friday, September 26, 2003, at 07:29 PM, Starr Hazard wrote:
>>    
>>
>>>Folks,
>>>
>>>In a recent paper, Kawasacki et al(pubmed 12808467) report on the
>>>interaction between a specific miRNA (human miRNA23 g.i. 17646028) and
>>>a specific mRNA (human HES1 g.i. 8400709). They suggest they did a
>>>BLAST search and ultimately located the interaction. I cannot
>>>duplicate their data mining and cannot find the association they
>>>describe.
>>>
>>>In general, is there a way to take a library of miRNAs and evaluate
>>>their potential interaction with a particular mRNA? Or is there a data
>>>mining tool that could screen a large pool of mRNAs for
>>>significant interactions with a pool miRNAs?
>>>
>>>I cannot at present see any BioPerl tools that address this issue
>>>(right now that means I scanned the FAQ for the string RNA and
>>>searched the BioPerl site for RNA but found only some traffic about
>>>Seq.pm).The people I have asked seem divided about whether this is
>>>text matching issue or more of a hybridization issue involving
>>>an energy of interaction evaluation.
>>>
>>>Anybody got any pointers to offer?
>>>
>>>Starr
>>>_______________________________________________
>>>Bioperl-l mailing list
>>>Bioperl-l at portal.open-bio.org
>>>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>>      
>>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at portal.open-bio.org
>>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>>    
>>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at portal.open-bio.org
>http://portal.open-bio.org/mailman/listinfo/bioperl-l
>  
>

-- 
Stefan Kirov, Ph.D.
University of Tennessee/Oak Ridge National Laboratory
1060 Commerce Park, Oak Ridge
TN 37830-8026
USA
tel +865 576 5120
fax +865 241 1965
e-mail: skirov at utk.edu
sao at ornl.gov




More information about the Bioperl-l mailing list