[Bioperl-l] re covering sequences using SNP data

subiotech sumithanallu at gmail.com
Thu May 26 01:13:34 UTC 2011


I have the reference sequence of a gene and its variants (SNPs+ indels) from
101 accessions. Now I need to recover the sequences of the gene for all the
101 accessions using the variant information. I'm new to Perl, until now I
was able to get the script to identify the fasta sequence given the id. The
template of the files are:
1)variant file:
acc_name   position  variant
abc             10         C
xyz              15         G
2)fasta file
>abc
ATTTGGGGGCCCCGGGGGGG
>xyz
TTTGGGGGCCCCAAAAAAA
My unfinished script:
#!/usr/bin/perl 
use Bio::SeqIO;
use IO::String;
use Bio::SearchIO;
#usage variantfile seqfile outfile
my $variantfile = shift or die;
# open idfile  and store the info in the array.
open (IDFILE, $variantfile) || die "Can't open $file1: $!\n";
while (<IDFILE>) {                  # loop through IDFILE one line at a time
    chomp;                         # remove any newlines if they exist;
  	push(@id, $_);
}

my $seqfile= shift or die;
my $outfile= shift or die;

my $gb = Bio::SeqIO->new(-file   => "<$seqfile",
                              -format => "fasta");
my $seq_out = Bio::SeqIO->new('-file' => ">$outfile",
                                       '-format' => 'FASTA');

while($seq = $gb->next_seq) { 

    $seq_out->write_seq($seq) if (grep {$_ eq $seq->id} @id);

} 
exit;

Thanks for your help!!!

-- 
View this message in context: http://old.nabble.com/recovering-sequences-using-SNP-data-tp31704101p31704101.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.




More information about the Bioperl-l mailing list