[Bioperl-l] Extracting a particular feature from a sequence
Marc Logghe
Marc.Logghe at devgen.com
Thu May 19 17:32:43 EDT 2005
> Another route you could follow: convert your genbank feature
> table into gff (bp_genbank2gff.pl or bp_genbank2gff3.pl, or
> even using EMBOSS:
> seqret your.gb -feature -offormat2 gff).
> Next, using the in memory adaptor of gbrowse you can
> perfectly query this gff "database" for the features you'd
> like and get hands on the strand information.
> I'll try to dig up an example script and come back to you.
I'm back.
I set up demo database using genbank record NC_003198 (a big chunk of
DNA and features).
1) saved record into sequence.gb
2) created directory ./genome (working dir
/home/marcl/gbrowse_in-memory/)
3) bp_genbank2gff3.pl --outdir genome/ --split sequences.gb
The latter creates the fasta and gff files in genome. The gff contained
20937 features.
<script>
#!/usr/bin/perl
use strict;
use Bio::DB::GFF;
my $db = Bio::DB::GFF->new( -adaptor => 'memory',
-dir =>
'/home/marcl/gbrowse_in-memory/genome',
);
my ($feat) = $db->features( -types => ['gene'],
-attributes => { locus_tag => 'STY2701' } );
printf "strand = %s\n", $feat->strand;
printf "$_ = %s\n", $feat->get_tag_values($_) for ( $feat->all_tags
);
</script>
<output>
strand = -1
locus_tag = STY2701
gene = eutN
db_xref = GeneID:1249018
note = synonym: cchB
</output>
This route should be fine for small genomes. It has to fit into memory.
HTH,
Marc
More information about the Bioperl-l
mailing list