[Bioperl-l] Bio::AlignIO::Mase

Wed Jun 8 13:38:30 UTC 2011

Hi, Tristan,

For your first two questions, 

$entry =~ s/[^A-Za-z0-9\.\-]//g; # It recursively remove all non
"A-Za-z0-9.-"
If you change it to $entry =~ m/[^A-Za-z0-9\.\-]/; #It will find the first
non "A-Za-z0-9.-", and do nothing (except return 1).

'/[^' and '/^[' are two different things in the reg-exp. [^abc]  means
non-abc in the string. ^[abc] means the string should start with abc.

I don't understand why you are looking for $/. $/ is OUTPUT_FIELD_SEPARATOR.
You can set it in your own script, for example:

$old_seperator=$/;
$/="\t";

Then the line should end with "\t". After that, you can change it back
using:
$/=$old_seperator;

For your patch, I think it is written well. Since you don't want to remove
the digits in your sequence, this is why

$entry =~ s/[^A-Za-z\.\-]//g;
is changed into 
$entry =~ s/[^A-Za-z0-9\.\-]//g;

Otherwise, all your digits will be removed.

Cheers,
Jun Yin
Ph.D. student in U.C.D.

Bioinformatics Laboratory
Conway Institute
University College Dublin

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Tristan Lefebure
Sent: Wednesday, June 08, 2011 1:45 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] Bio::AlignIO::Mase

Hi there,

I have some weird alignments with some numerical code stored 
within the sequence strings (eg. frameshift genewise code). 
Most AlignIO module I have tried eat them without any 
trouble except for Bio::AlignIO::Mase.

The following patch seems to do the trick:

diff -u mase.pm mase_mod.pm

--- mase.pm     2011-06-08 14:08:58.558033996 +0200
+++ mase_mod.pm 2011-06-08 14:09:20.388066014 +0200
@@ -109,7 +109,7 @@
 
        while( $entry = $self->_readline) {
            $entry =~ /^;/ && last;
-           $entry =~ s/[^A-Za-z\.\-]//g;
+           $entry =~ s/[^A-Za-z0-9\.\-]//g;
            $seq .= $entry;
        }
        if( $end == -1) {

But I am left with the feeling that I don't really 
understand why this works (which I don't quite like before 
pushing a patch...)

Why doing a s///g instead of a simple m//, and why doing 
'/[^' and not '/^['... Is that linked to that fact that $/ 
was modified to read chunks of files? BTW where is $/ set? I 
searched in Bio::Root::IO but didn't find it... 

Oh so many questions...

Thanks!

--
Tristan



_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l