[Bioperl-l] How can I pull out all instances of a motif from a genome sequence and output them as a BED file?

Chris Fields cjfields at uiuc.edu
Thu Jun 14 01:58:37 UTC 2007


This is answered in the FAQ (sorry if the URL wraps, but we don't  
like tinyurls):

http://www.bioperl.org/wiki/ 
FAQ#How_do_I_do_motif_searches_with_BioPerl.3F_Can_I_do_. 
22find_all_sequences_that_are_75.25_identical.22_to_a_given_motif.3F

chris

On Jun 13, 2007, at 7:20 PM, John Cumbers wrote:

> Hello,
>
> I have a simple problem, I'm trying to search a genome sequence for  
> a motif,
> I then want to output a BED file to display all the locations of  
> this motif
> on the UCSC Genome Browser.  I could not find a script to do this,  
> so I
> started to write my own.   I'm new to perl and my code below was my  
> attempt
> to read the sequence string and output the index bp of the start of  
> each
> motif.  With this I could build the BED file myself, which requires  
> start
> and finish base pairs.
>
> For the first motif I can output the start index, but when I try  
> and read
> the next one off the sequence it does not work.  Instead I just get an
> output of a list of 1's.  I realise that this is more a request for  
> some
> simple perl help, but any help much appreciated.
>
> Best wishes,
> John
>
>
> $seq_object = read_sequence 
> ("Drosophila.Chr3.test.AE014296.fasta");  #turn
> my FASTA file into a seq object.
> $sequence_as_a_string = $seq_object->seq();  #turn it into a string
> # search $sequence_as_a_string  string for motif AAA as example
> # if found, return the index that it is found at
>
> while ($sequence_as_a_string =~ m/AAA/g) {
>   print "Found '$&'.  Next attempt at character " .
> pos($sequence_as_a_string)+1 . "\n";
> }
>
>
>
> -- 
> John Cumbers,  Graduate Student
> Biology and Medicine
> Brown University, Box G-W
> Providence, Rhode Island, 02912, USA
> Tel USA: +1 401 523 8190,  Fax: +1 401 863-2166
> UK to USA: 0207 617 7824
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign






More information about the Bioperl-l mailing list