[Bioperl-l] How does '-alphabet' help? Is there any function which could remove "wrong" characters?
vaushev at gmail.com
Sun Oct 13 04:14:24 UTC 2013
Thank you Christopher,
your explanation makes sense, I agree.
On Sun, Oct 13, 2013 at 8:09 AM, Fields, Christopher J <
cjfields at illinois.edu> wrote:
> A more interesting question is: should there be (at least, should there be
> within Bioperl)? The assumption made for creating a Bio::Seq is that the
> characters passed are valid *prior* to creating the instance, and that the
> parsers (or user, if a parser isn't used) are generally in charge of
> dealing with such issues. Some attempts to set up simple validation of
> strings are used within Bio::Tools::IUPAC and I think
> Bio::Tools::SeqPattern, if you want to delve into that code.
> For removing carriage returns, just use 'chomp':
> my $text = <INFILE>;
> chomp $text;
> As for dealing with non-valid characters, it depends on what you mean by
> 'non-valid'. All letters are valid IUPAC for protein seqs, and
> ACGTUMRWSYKVHDBXN are valid IUPAC nucleotide characters (we won't include
> other possible symbols for gaps, frameshifts, etc for simplicity). You may
> want to leave out ambiguous characters for your case. You could maybe
> generate a regex from Bio::Tools::IUPAC for valid chars and use the inverse
> of that to 'clean' a sequence, but a straightforward way is to simply
> generate your valid string of chars and replace everything not matching to
> it, as Jing's example does.
More information about the Bioperl-l