[Bioperl-l] help converting seq objects.
Pedro Antonio Reche
reche at research.dfci.harvard.edu
Mon Aug 16 18:30:44 EDT 2004
Hi all,
I have combined two scripts kindly provided by Jason Stajich to 1)
Retrieve a list of seq records from genbank saving them in GenBank
format; and 2) get the translation feature from the file previously
saved. The program -shown below- is works nicely, but I will nicer if I
could parse the sequence object obtained from the GenBank database
without printing it. Does anyone knows how to do this? Thanks in
advance for any help.
#!/usr/sbin/perl -w
# How to retrieve GenBank entries over the Web
# by Jason Stajich
use Bio::DB::GenBank;
use Bio::SeqIO;
use Bio::Seq;
$in = shift @ARGV;
###### acc numbes from file an array ##########
open (F, "$in") || die;
while(<F>){
next unless /(NM_\d+)\s/;
$acc = $1;
chomp($acc);
push @AC, $acc;
}
close(F);
$list = join " ", @AC;
print "$list\n";
##### get genbank records into a file
#######################################################
my $gb = new Bio::DB::GenBank;
my $seqio = $gb->get_Stream_by_acc(\@AC);
my $seqout = new Bio::SeqIO(-file => ">$in.gb", -format => 'GenBank');
#my $seqout = new Bio::SeqIO(-file => ">$ARGV[0].gb", -format =>
'GenBank');
while( defined ($seq = $seqio->next_seq )) {
$seqout->write_seq($seq);
}
###########get translation from file just created
############################################
my $tmp = "$in.gb";
my $temp = "$in.tfa";
my $in = new Bio::SeqIO(-file => "<$tmp", -format => 'genbank');
my $out = new Bio::SeqIO(-file => ">$temp", -format => 'fasta');
while ($seq = $in->next_seq ) {
foreach my $f ( grep { $_->primary_tag eq 'CDS' }
$seq->top_SeqFeatures ) {
my ($gname);
if ( $f->has_tag('gene') ) {
($gname) = $f->each_tag_value('gene');
} elsif ( $f->has_tag('product') ) {
($gname) = $f->each_tag_value('product');
}
($gname) =~ s/\s+/_/g;
my ($ref) = $f->has_tag('protein_id') &&
$f->each_tag_value('protein_id');
my ($gi) = $f->has_tag('db_xref') &&
$f->each_tag_value('db_xref');
my ($translation) = $f->has_tag('translation') &&
$f->each_tag_value('translation');
unless( $gi && $ref && $gname && $translation ) {
print STDERR "not fully annotated CDS
($gi,$ref,$gname), skipping...\n";
next;
}
my $tfa = Bio::PrimarySeq->new (-seq => $translation,
-display_id =>
sprintf("%s|%s|%s",$gi,$ref,$gname));
$out->write_seq($tfa);
}
}
========================================================================
=====================
Dr. Pedro A Reche, Instructor of Medicine
Dana-Farber Cancer Institute, Harvard Medical School
TL: 617 632 3824
HIM Building, Room 401
FX: 617 632 3351
77 Avenue Louis Pasteur EM:
reche at research.dfci.harvard.edu
Boston, MA 02115, USA
W3: www.mifoundation.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 3382 bytes
Desc: not available
Url : http://portal.open-bio.org/pipermail/bioperl-l/attachments/20040816/9256957c/attachment.bin
More information about the Bioperl-l
mailing list