[Bioperl-l] Extracting a particular feature from a sequence

Thu May 19 17:32:43 EDT 2005

> Another route you could follow: convert your genbank feature 
> table into gff (bp_genbank2gff.pl or bp_genbank2gff3.pl, or 
> even using EMBOSS:
> seqret your.gb  -feature -offormat2 gff).
> Next, using the in memory adaptor of gbrowse you can 
> perfectly query this gff "database" for the features you'd 
> like and get hands on the strand information.
> I'll try to dig up an example script and come back to you.

I'm back.
I set up demo database using genbank record NC_003198 (a big chunk of
DNA and features).
1) saved record into sequence.gb
2) created directory ./genome (working dir
/home/marcl/gbrowse_in-memory/)
3) bp_genbank2gff3.pl --outdir genome/ --split sequences.gb
The latter creates the fasta and gff files in genome. The gff contained
20937 features.

<script>
#!/usr/bin/perl
use strict;
use Bio::DB::GFF;

my $db = Bio::DB::GFF->new( -adaptor => 'memory',
                            -dir     =>
'/home/marcl/gbrowse_in-memory/genome',
);
my ($feat) = $db->features( -types      => ['gene'],
                            -attributes => { locus_tag => 'STY2701' } );
printf "strand = %s\n", $feat->strand;
printf "$_ = %s\n",     $feat->get_tag_values($_) for ( $feat->all_tags
);
</script>

<output>
strand = -1
locus_tag = STY2701
gene = eutN
db_xref = GeneID:1249018
note = synonym: cchB
</output>

This route should be fine for small genomes. It has to fit into memory.

HTH,
Marc