[Bioperl-l] Creating Genbank file
Jason Stajich
jason@cgt.mc.duke.edu
Mon, 9 Dec 2002 11:47:30 -0500 (EST)
On Mon, 9 Dec 2002, Vilanova,David,LAUSANNE,NRC-BS wrote:
> Dear all,
> I'm trying to reconstruct a genbank annotation file and I have some
> problems.
> I want to add a subseqfeature. I have the following code. I'm not sure I'm
> using it properly but I get an output without the subseqfeature. The rest is
> ok.
> How can I add subseqfeatures ???
>
> Thanks,
> David
>
>
>
> use Bio::SeqIO;
> use Bio::SeqFeature::Generic;
>
>
> die "Usage: per script.pl Fasta_File Feaures_File" unless @ARGV eq '2';
>
> my($seq_file,$feat_file) = @ARGV;
>
> my $seq_in = new Bio::SeqIO (-file => $seq_file,
> -format => 'fasta'
> );
> $sequence= $seq_in->next_seq;
>
> my $seq = new Bio::Seq (-seq =>$sequence->seq(),
> -id => 'Genome_v10,
> -desc =>"My genome",
> -authors =>"List here"
> );
> my $feat = new Bio::SeqFeature::Generic ( -start => 1,
> -end => 365,
> -strand => 1,
> -primary =>"Gene",
> -tag => {
> note => "NOTE here"}
> #-primary => 'CDS',
> );
> my $subfeat = new Bio::SeqFeature::Generic (-start =>1,
> -end => 365,
> -strand => 1,
> -primary => 'CDS',
> #-primary => "CDS",
> -tag => {
> note =>"Note Here"}
> );
>
> $feat->add_sub_SeqFeature($subfeat);
>
> $seq->add_SeqFeature($feat);
>
>
> my $seq_out = new Bio::SeqIO(-file=>">Genome_v10.gbk",
> -format=>"genbank");
>
> $seq_out->write_seq($seq);
Doesn't quite work that way anymore - we do sub-features by locations like
this. Sub-features don't really get unrolled properly from SeqFeatures at
this point - the below should work, note that the gene and CDS are
separate features.
BTW It would be nice if someone would map the
SeqFeature::Gene::GeneStructureI objects into something similar to the
below structure for direct output by the SeqIO objects. ie, Join the
transcripts' exons into sets of CDSes, introns as separate features, and
an encompassing gene feature -- basically EMBL/GenBank submission ready -
all from a single Bio::SeqFeature::Gene::GeneStructureI object.
A small project waiting for someone to jump on, basically means reading
the API documentation and generating a bunch of tests, and writing a
single function in Bio::SeqFeature::Gene::GeneStructureI which generates
an array of remapped SeqFeature::Generic objects (at least in my first
pass thought of how to do it)...
#!/usr/bin/perl -w
use Bio::Seq;
use Bio::SeqFeature::Generic;
use Bio::Location::Split;
use Bio::Location::Simple;
use Bio::SeqIO;
my $seq = new Bio::Seq(-seq => 'N'x365,
-display_id => 'genome_v10');
my $gene = new Bio::SeqFeature::Generic(-start => 1,
-end => 365,
-primary => 'gene',
-tag => { 'gene' => 'mysteryoase'});
# note the use of common 'gene' tags in the NCBI-way to link features
$seq->add_SeqFeature($gene);
my $cds = new Bio::SeqFeature::Generic(-primary => 'CDS',
-tag => { 'gene' => 'mysteryoase'});
my $cdsloc = new Bio::Location::Split();
$cdsloc->add_sub_Location(Bio::Location::Simple->new(-start => 1,
-end => 100));
$cdsloc->add_sub_Location(Bio::Location::Simple->new(-start => 200,
-end => 365));
$cds->location($cdsloc);
$seq->add_SeqFeature($cds);
my $out = new Bio::SeqIO(-format => 'genbank');
$out->write_seq($seq);
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>
--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu