Bioperl: Checking DNA alphabeth
Paul Gordon
gordonp@niji.imb.nrc.ca
Thu, 18 May 2000 12:33:29 -0300 (ADT)
> I don't think there is an actual function, but
> this is very simply implemented:
>
> # $seq_obj is a Bio::Seq (or Bio::PrimarySeq)
> my $str = $seq_obj->seq();
> # $str is now the actual sequence string
> die "Sequence contains non [ACGT] characters"
> if $str =~ /[^ACGT]/;
>
> James
>
> On Thu, 18 May 2000, gert thijs wrote:
>
> > I am writing a script to process some DNA
> > sequences which are in fasta format. Now I
> > would like to check if these sequences contain
> > other symbols then ACGT, like N or degenerate
> > symbols. Is there a function in bioperl that
> > does this or has someone any idea which
> > regular expression to use?
Isn't Perl beautiful? You could also use a Perl one-liner to filter out
ambiguous sequences in a FastA file:
perl -ne 'BEGIN{$/=">"}chomp;print ">$_" if $_ && !/^.*?\n.*[^acgtu\n]/si' FILE
But I digress...
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================