[Bioperl-l] How does '-alphabet' help? Is there any function which could remove "wrong" characters?

Sun Oct 13 04:14:24 UTC 2013

Thank you Christopher,
your explanation makes sense, I agree.

On Sun, Oct 13, 2013 at 8:09 AM, Fields, Christopher J <
cjfields at illinois.edu> wrote:

> A more interesting question is: should there be (at least, should there be
> within Bioperl)?  The assumption made for creating a Bio::Seq is that the
> characters passed are valid *prior* to creating the instance, and that the
> parsers (or user, if a parser isn't used) are generally in charge of
> dealing with such issues.  Some attempts to set up simple validation of
> strings are used within Bio::Tools::IUPAC and I think
> Bio::Tools::SeqPattern, if you want to delve into that code.
>
> For removing carriage returns, just use 'chomp':
>
>     my $text = <INFILE>;
>     chomp $text;
>
> As for dealing with non-valid characters, it depends on what you mean by
> 'non-valid'.  All letters are valid IUPAC for protein seqs, and
> ACGTUMRWSYKVHDBXN are valid IUPAC nucleotide characters (we won't include
> other possible symbols for gaps, frameshifts, etc for simplicity).  You may
> want to leave out ambiguous characters for your case.  You could maybe
> generate a regex from Bio::Tools::IUPAC for valid chars and use the inverse
> of that to 'clean' a sequence, but a straightforward way is to simply
> generate your valid string of chars and replace everything not matching to
> it, as Jing's example does.
>
> chris
>