[Bioperl-l] Fasta Genome Splice
David Clark
dfclark at neo.tamu.edu
Fri Feb 13 20:39:22 EST 2004
Thanks Jason, this is exactly what I needed. I just took peek in
Seq.pm to see how the sequence objects are implemented, used your
example, and I'm ready to go.
David
On Feb 12, 2004, at 2:46 PM, Jason Stajich wrote:
> On Thu, 12 Feb 2004, David Clark wrote:
>
>> Good point. What I need is two fasta files: one with the ofr regions
>> masked, and one with the non-ofr regions masked.
>
> This is a little bit of work, but pretty easy since you can fit whole
> yeast chromosomes into memory. I do it by figuring out what I want to
> mask and then do:
> substr($chromseq,$start,$len,'N'x$len)
>
> So you can just write a simple parser for the chromsomal_features.tab
> while(<FILE> ){
> my ($feature,$gene,$sgdid, ... etc ) = split(/\t/,$_);
> # do the substr replace here
> }
>
>> There was another thing I wanted to do that I didn't mention before:
>> how
>> can I generate the reverse compliment of a whole genome file?
>
> That's easy with emboss
> % revseq FILE.fwd FILE.rev
>
> With bioperl -- see the Sequence HOWTO in the howto section of the
> bioperl
> website. you want to use the revcom method in bioperl Bio::PrimarySeq
> objects.
>
> # change fasta to whatever format you have/want the sequences in
> my $in = Bio::SeqIO->new(-file => 'filename', -format => 'fasta');
> my $out = Bio::SeqIO->new(-file => '>filename.rev', -format =>
> 'fasta');
> while( my $s = $in->next_seq ) {
> $out->write_seq($s->revcom);
> }
More information about the Bioperl-l
mailing list