[Bioperl-l] Bio::SeqIO::genbank.pm

valiente at lsi.upc.edu valiente at lsi.upc.edu
Fri Nov 6 08:06:48 UTC 2009



There is a line in Bio::SeqIO::genbank.pm to convert data in classification lines into a classification array by splitting only
on ';' or '.' so that a classification that is 2
or more words will still get
matched,my @class = map { s/^\s+//; s/\s+$//; s/\s{2,}/ /g; $_; } split /(?<!subgen)[;\.]+/, $class_lines;but this
will break organism names that have a dot inside, such as "Salmonella
enterica subsp. enterica serovar Typhimurium", which is now
being broken into "Salmonella enterica subsp" and "enterica serovar
Typhimurium".Changing [;\.]
to [;] solves this issue,my @class = map { s/^\s+//; s/\s+$//; s/\s{2,}/ /g; $_; } split /(?<!subgen)[;]+/,
$class_lines;Does anybody want to further
test it before I commit this change? Thanks,Gabriel



More information about the Bioperl-l mailing list