[Bioperl-l] Bio::AlignIO::Mase
Jason Stajich
jason.stajich at gmail.com
Wed Jun 8 14:27:44 UTC 2011
Hi Tristan -
This regular expression is to is to strip everything that isn't a letter, . or -
the [^] means match everything EXCEPT what follows. I guess if numeric values are valid in these type of alignments you would just add \d (instead of 0-9)
So you are asking for the parser to not strip out frameshift info from a MASE parser?
This doesn't have anything to do with the chunk pattern or size set with $/ AFAIK.
On Jun 8, 2011, at 7:45 AM, Tristan Lefebure wrote:
> Hi there,
>
> I have some weird alignments with some numerical code stored
> within the sequence strings (eg. frameshift genewise code).
> Most AlignIO module I have tried eat them without any
> trouble except for Bio::AlignIO::Mase.
>
> The following patch seems to do the trick:
>
> diff -u mase.pm mase_mod.pm
> --- mase.pm 2011-06-08 14:08:58.558033996 +0200
> +++ mase_mod.pm 2011-06-08 14:09:20.388066014 +0200
> @@ -109,7 +109,7 @@
>
> while( $entry = $self->_readline) {
> $entry =~ /^;/ && last;
> - $entry =~ s/[^A-Za-z\.\-]//g;
> + $entry =~ s/[^A-Za-z0-9\.\-]//g;
> $seq .= $entry;
> }
> if( $end == -1) {
>
> But I am left with the feeling that I don't really
> understand why this works (which I don't quite like before
> pushing a patch...)
>
> Why doing a s///g instead of a simple m//, and why doing
> '/[^' and not '/^['... Is that linked to that fact that $/
> was modified to read chunks of files? BTW where is $/ set? I
> searched in Bio::Root::IO but didn't find it...
>
> Oh so many questions...
>
> Thanks!
>
> --
> Tristan
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list