[Bioperl-l] testing for sequence

Heikki Lehvaslaiho heikki@ebi.ac.uk
Thu, 14 Mar 2002 15:19:41 +0000


Guoneng Zhong wrote:
> 
> Hi,
> Is there a way for me to know if a given string looks like a protein or
> dna/rna sequence?  Other than doing a grep on all the DNA and Protein
> symbols?

There is an internal method _guess_alphabet @ Bio::PrimarySeqI which is
called when you set the seq() method with your sequence string. It sets the
alphabet into dna/rna/protein depending on ([atgc]u? count / seq_length). If
the ration is above 85% then it is not a protein. This is heuristics but
works in most cases. You can always confuse this and almost any algorithm
with heavy use of ambiguous nucleotide characters.

	-Heikki




> G
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/                      http://www.ebi.ac.uk/mutations/
     _/  _/  _/  Heikki Lehvaslaiho          heikki@ebi.ac.uk
    _/_/_/_/_/  EMBL Outstation, European Bioinformatics Institute
   _/  _/  _/  Wellcome Trust Genome Campus, Hinxton
  _/  _/  _/  Cambs. CB10 1SD, United Kingdom
     _/      Phone: +44 (0)1223 494 644   FAX: +44 (0)1223 494 468
___ _/_/_/_/_/________________________________________________________