[Bioperl-l] exporting contigs with CDSes, stored via Bio::DB::GFF, into individual GenBank records?
Erich Schwarz
schwarz at tenaya.caltech.edu
Mon Sep 29 05:57:43 UTC 2008
Hi all,
I have newly sequenced contigs, with CDS predictions, loaded
into a Bio::DB::GFF-readable format (i.e., loaded into a MySQL
database via Bio::DB::GFF). I'd like to export each contig, with
its annotated CDSes, into a single GenBank-formatted record for each
contig (in order to be able submit this stuff to GenBank, without
having to waste time with Sequin). Is there some straightforward
way of getting Bio::DB::GFF to do that?
Some time ago, when I last had to decipher BioPerl, I came up
with code that would let me export protein translations of the
contigs' CDSes in GenBank format:
-------------------------------------------------------------------
#!/usr/bin/env perl
use strict;
use warnings;
use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::GFF;
my $query_database = $ARGV[0];
my $dna = q{};
my $db = Bio::DB::GFF->new( -dsn => $query_database);
my $gb_file = 'example.gb';
my $seq_out = Bio::SeqIO->new( -file => ">$gb_file", -format => 'genbank', );
my @contigs = sort
map { $_->display_id }
$db->features( -types => 'contig:assembly' );
foreach my $contig (@contigs) {
my $segment1 = $db->segment($contig);
my @p_txs = $segment1->features('processed_transcript');
foreach my $p_t (sort @p_txs) {
$dna = q{};
my @CDSes = $p_t->CDS;
my $cds_name = $CDSes[0]->display_id();
foreach my $cds (@CDSes) {
# $cds->seq == Bio::PrimarySeq, *not* plain nt seq.!
$dna = $dna . $cds->seq->seq;
}
my $full_cds = Bio::Seq->new( -display_id => $cds_name,
-seq => $dna, );
my $prot = $full_cds->translate;
$seq_out->write_seq($prot);
}
}
--------------------------------------------------------------------
Returning to this, I tried using
$db->get_Seq_by_id($contig)
to give me a Bio::Seq object for each contig (which I could then
output into GenBank form), but that proved futile.
I'm willing to work this out on my own if I have to, but if
somebody can answer this in 30 seconds, it will save me a lot of
time -- and be very appreciated!
--Erich
More information about the Bioperl-l
mailing list