Bioperl: repetitive DNA
Lincoln Stein
lstein@cshl.org
Sun, 7 Nov 1999 16:42:56 -0500 (EST)
No module needed. Here's a simple one-line regular expression that
does everything that dust does. It catches all repeats of unit length
1 or greater that are repeated at least 4 times.
$sequence =~ s/((.+)\2{4,})/'N' x length $1/eg;
This one occurred to me while writing problems for the CSHL genome
informatics course.
Lincoln
Alessandro Guffanti writes:
> Hi. I think a good solution could also be to use NCBI's DUST
> filter with a suitable cut-off, then retrieve the coordinates
> of masked sequences through a perl wrapper - c'est fait.
> You can retrieve DUST from WU ftp server:
>
> ftp://blast.wustl.edu/pub/dust
>
> >test
> acgatgacgatgatatatatatatatacataatatatatcacagggga
> atatatatatcccacataatata
>
> dust test
> >test
> acgatgacgatgNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNcc
> cacataatata
>
> dust test 45
> >test
> acgatgacgatgatatatatatatatacataatatatatcacaggggaatatatatatcc
> cacataatata
>
>
> Best Wishes,
>
> Alessandro.
>
> BTW, I think that this could be a good startup for a "filtering"
> module. Do you think this could be interesting ? It could be a
> method in a sequence object or a separate module per se. The outcome
> could be a list of coordinates in the sequence which correspond to
> masked areas. I would be happy to produce a rough version of this.
>
>
> --
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Alessandro Guffanti - Informatics
> The Sanger Centre, Wellcome Trust Genome Campus
> Hinxton, Cambridge CB10 1SA, United Kingdom
> phone: +1223-834244 * fax: +1223-494919
> http://www.sanger.ac.uk/Users/ag3
> =========== Bioperl Project Mailing List Message Footer =======
> Project URL: http://bio.perl.org/
> For info about how to (un)subscribe, where messages are archived, etc:
> http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
> ====================================================================
--
========================================================================
Lincoln D. Stein Cold Spring Harbor Laboratory
lstein@cshl.org Cold Spring Harbor, NY
========================================================================
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================