[Bioperl-l] re covering sequences using SNP data
subiotech
sumithanallu at gmail.com
Thu May 26 01:13:34 UTC 2011
I have the reference sequence of a gene and its variants (SNPs+ indels) from
101 accessions. Now I need to recover the sequences of the gene for all the
101 accessions using the variant information. I'm new to Perl, until now I
was able to get the script to identify the fasta sequence given the id. The
template of the files are:
1)variant file:
acc_name position variant
abc 10 C
xyz 15 G
2)fasta file
>abc
ATTTGGGGGCCCCGGGGGGG
>xyz
TTTGGGGGCCCCAAAAAAA
My unfinished script:
#!/usr/bin/perl
use Bio::SeqIO;
use IO::String;
use Bio::SearchIO;
#usage variantfile seqfile outfile
my $variantfile = shift or die;
# open idfile and store the info in the array.
open (IDFILE, $variantfile) || die "Can't open $file1: $!\n";
while (<IDFILE>) { # loop through IDFILE one line at a time
chomp; # remove any newlines if they exist;
push(@id, $_);
}
my $seqfile= shift or die;
my $outfile= shift or die;
my $gb = Bio::SeqIO->new(-file => "<$seqfile",
-format => "fasta");
my $seq_out = Bio::SeqIO->new('-file' => ">$outfile",
'-format' => 'FASTA');
while($seq = $gb->next_seq) {
$seq_out->write_seq($seq) if (grep {$_ eq $seq->id} @id);
}
exit;
Thanks for your help!!!
--
View this message in context: http://old.nabble.com/recovering-sequences-using-SNP-data-tp31704101p31704101.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
More information about the Bioperl-l
mailing list