[Bioperl-l] Bio::AlignIO::Mase
Jun Yin
jun.yin at ucd.ie
Wed Jun 8 13:38:30 UTC 2011
Hi, Tristan,
For your first two questions,
$entry =~ s/[^A-Za-z0-9\.\-]//g; # It recursively remove all non
"A-Za-z0-9.-"
If you change it to $entry =~ m/[^A-Za-z0-9\.\-]/; #It will find the first
non "A-Za-z0-9.-", and do nothing (except return 1).
'/[^' and '/^[' are two different things in the reg-exp. [^abc] means
non-abc in the string. ^[abc] means the string should start with abc.
I don't understand why you are looking for $/. $/ is OUTPUT_FIELD_SEPARATOR.
You can set it in your own script, for example:
$old_seperator=$/;
$/="\t";
Then the line should end with "\t". After that, you can change it back
using:
$/=$old_seperator;
For your patch, I think it is written well. Since you don't want to remove
the digits in your sequence, this is why
$entry =~ s/[^A-Za-z\.\-]//g;
is changed into
$entry =~ s/[^A-Za-z0-9\.\-]//g;
Otherwise, all your digits will be removed.
Cheers,
Jun Yin
Ph.D. student in U.C.D.
Bioinformatics Laboratory
Conway Institute
University College Dublin
-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Tristan Lefebure
Sent: Wednesday, June 08, 2011 1:45 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] Bio::AlignIO::Mase
Hi there,
I have some weird alignments with some numerical code stored
within the sequence strings (eg. frameshift genewise code).
Most AlignIO module I have tried eat them without any
trouble except for Bio::AlignIO::Mase.
The following patch seems to do the trick:
diff -u mase.pm mase_mod.pm
--- mase.pm 2011-06-08 14:08:58.558033996 +0200
+++ mase_mod.pm 2011-06-08 14:09:20.388066014 +0200
@@ -109,7 +109,7 @@
while( $entry = $self->_readline) {
$entry =~ /^;/ && last;
- $entry =~ s/[^A-Za-z\.\-]//g;
+ $entry =~ s/[^A-Za-z0-9\.\-]//g;
$seq .= $entry;
}
if( $end == -1) {
But I am left with the feeling that I don't really
understand why this works (which I don't quite like before
pushing a patch...)
Why doing a s///g instead of a simple m//, and why doing
'/[^' and not '/^['... Is that linked to that fact that $/
was modified to read chunks of files? BTW where is $/ set? I
searched in Bio::Root::IO but didn't find it...
Oh so many questions...
Thanks!
--
Tristan
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
More information about the Bioperl-l
mailing list