[Bioperl-l] Fasta Genome Splice

Fri Feb 13 20:39:22 EST 2004

Thanks Jason, this is exactly what I needed.  I just took peek in 
Seq.pm to see how the sequence objects are implemented, used your 
example, and I'm ready to go.

David

On Feb 12, 2004, at 2:46 PM, Jason Stajich wrote:

> On Thu, 12 Feb 2004, David Clark wrote:
>
>> Good point.  What I need is two fasta files: one with the ofr regions
>> masked, and one with the non-ofr regions masked.
>
> This is a little bit of work, but pretty easy since you can fit whole
> yeast chromosomes into memory.  I do it by figuring out what I want to
> mask and then do:
>  substr($chromseq,$start,$len,'N'x$len)
>
> So you can just write a simple parser for the chromsomal_features.tab
> while(<FILE> ){
>   my ($feature,$gene,$sgdid, ... etc ) = split(/\t/,$_);
>   # do the substr replace here
> }
>
>> There was another thing I wanted to do that I didn't mention before: 
>> how
>> can I generate the reverse compliment of a whole genome file?
>
> That's easy with emboss
> % revseq FILE.fwd FILE.rev
>
> With bioperl -- see the Sequence HOWTO in the howto section of the 
> bioperl
> website.  you want to use the revcom method in bioperl Bio::PrimarySeq
> objects.
>
> # change fasta to whatever format you have/want the sequences in
> my $in = Bio::SeqIO->new(-file => 'filename', -format => 'fasta');
> my $out = Bio::SeqIO->new(-file => '>filename.rev', -format => 
> 'fasta');
> while( my $s = $in->next_seq ) {
>   $out->write_seq($s->revcom);
> }