[Bioperl-l] How does '-alphabet' help? Is there any function which could remove "wrong" characters?

Jing Yu logust79 at googlemail.com
Sun Oct 13 03:35:56 UTC 2013


I guess you can pretreat the line with something like $text =~ s/[^ATCGatcg]//g;
On 13 Oct 2013, at 11:24, Vasily Aushev <vaushev at gmail.com> wrote:

> well, in this particular case, this is the format of input file which I can't change: it is not Fasta format but just the sequence in one (first) line of the file.
> But I am interested in more general question - is there a function which removes all invalid characters from the string.
> 
> 
> On Sun, Oct 13, 2013 at 6:56 AM, Jing Yu <logust79 at googlemail.com> wrote:
> Hi,
> 
> my $text = <INFILE>; only reads a line.
> 
> Why not just do:
> 
> my $seq1 = Bio::Seq->new(-file => 'yourfile', -format => 'Fasta');
> 
> 
> On 13 Oct 2013, at 10:48, Vasily Aushev <vaushev at gmail.com> wrote:
> 
> > in my very simple script, I am reading the sequence from the file by
> > my $text = <INFILE>;
> > and then making a new sequence object:
> > my $seq1 = Bio::Seq->new(-seq => $text, -alphabet => 'dna' );
> > After spending some time, I found that the 'carriage return' character
> > (0x0D) which occurs at the end of my string (it's a Windows file) causes
> > problems (exceptions) on further processing. I thought that defining the
> > -alphabet for the sequence object should remove this "wrong" character, but
> > it's not the case. So, my question - is there any function for removing all
> > characters which are not part of defined alphabet?
> >
> > Thanks in advance!
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 





More information about the Bioperl-l mailing list