Bioperl: "lightweight analyses"
Steve Chervitz
sac@neomorphic.com (Steve A. Chervitz)
Mon, 8 Mar 1999 11:54:02 -0800 (PST)
Andrew Dalke writes:
>
> #!/usr/local/bin/perl -pw
> BEGIN {
> %color = ( "A" => "red", "T" => "yellow",
> "C" => "blue", "G" => "green");
> }
> s|([ATCG])|"<font color='$color{$1}'>$1</font>"|eg if !/^>/;
>
> which colorizes DNA sequence in FASTA format for HTML, based on
> residue name. Writing it caused me to go on a (small) perl jag :)
>
> Hmm, a better s// might be:
>
> s!(A+|T+|C+|G+)!"<font color='" . $color{substr($1, 0, 1)} .
> "'>$1</font>"!eg if !/^>/;
>
> which keeps from having duplicate color changes in a row.
Nice. With just a little bit more, you can support uppercase and
lowercase sequences, count the number of sequences, and properly
handle newlines:
#!/usr/local/bin/perl -pw
BEGIN {
%color = ( "A" => "red", "T" => "yellow",
"C" => "blue", "G" => "green");
$count = 0;
print "<pre>";
}
END {
print "<\pre>";
print STDERR "$count sequences processed.\n";
}
s!(A+|T+|C+|G+)!"<font color='" . $color{substr("\u$1", 0, 1)} .
"'>$1</font>"!egi if not(/^>/ and ++$count);
Note that you need to use ++$count and not $count++.
Extra credit for a regexp that properly handles newlines without
resorting to the use of <pre></pre> (though I kind of prefer looking
at monospaced sequences).
SteveC
sac@neomorphic.com
=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================