Bioperl: Checking DNA alphabeth

Paul Gordon gordonp@niji.imb.nrc.ca
Thu, 18 May 2000 12:33:29 -0300 (ADT)


> I don't think there is an actual function, but
> this is very simply implemented:
> 
> 	# $seq_obj is a Bio::Seq (or Bio::PrimarySeq)
> 	my $str = $seq_obj->seq();
> 	# $str is now the actual sequence string
> 	die "Sequence contains non [ACGT] characters"
> 	    if $str =~ /[^ACGT]/;
> 
>     James
> 
> On Thu, 18 May 2000, gert thijs wrote:
> 
> > I am writing a script to process some DNA
> > sequences which are in fasta format.  Now I
> > would like to check if these sequences contain
> > other symbols then ACGT, like N or degenerate
> > symbols.  Is there a function in bioperl that
> > does this or has someone any idea which
> > regular expression to use?

Isn't Perl beautiful?  You could also use a Perl one-liner to filter out 
ambiguous sequences in a FastA file:

perl -ne 'BEGIN{$/=">"}chomp;print ">$_" if $_ && !/^.*?\n.*[^acgtu\n]/si' FILE

But I digress...

=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================