[Bioperl-l] How do you pull features out of a genbank file

Wes Barris wes.barris at csiro.au
Fri Sep 12 01:52:32 EDT 2003


Hi,

I have been struggling with the bioperl documentation trying to figure
out how to pull features out of a genbank file.  The following perl code
obviously does not work (because I made up the "next_feature" thing).

#!/usr/local/bin/perl -w
#
use strict;
use Bio::SeqIO;
#
my $usage = "Usage $0 <genbank.txt> <fasta.txt>\n";
my $infile = shift or die $usage;
my $outfile = shift or die $usage;

my $seq_in  = Bio::SeqIO->new( -format => 'genbank', -file => $infile);
my $seq_out = Bio::SeqIO->new( -format => 'fasta',   -file => ">$outfile");
while (my $seq = $seq_in->next_seq()) {
    my $newid = "gi|".$seq->primary_id."|ref|".$seq->accession.".".$seq->version."|";
    $seq->id($newid);

    while (my $feature = $seq->next_feature()) {	# <-- what should I put here???
       print("$feature\n");
       }

    $seq_out->write_seq($seq);
    }

What I would like to do is to gain access to the variations that are shown
in this sample genbank file snippit below:

FEATURES             Location/Qualifiers
      source          1..1470
                      /organism="Bos taurus"
                      /mol_type="mRNA"
                      /db_xref="taxon:9913"
                      /chromosome="14"
                      /map="14"
      gene            1..1470
                      /gene="DGAT1"
                      /db_xref="LocusID:282609"
      misc_feature    730..1422
                      /gene="DGAT1"
                      /note="ARE1; Region: COG5056, ARE1, Acyl-CoA cholesterol
                      acyltransferase [Lipid metabolism]"
                      /db_xref="CDD:COG5056"
      variation       694
                      /gene="DGAT1"
                      /note="single nucleotide polymorphism (snp)"
                      /replace="a"
      variation       695
                      /gene="DGAT1"
                      /note="single nucleotide polymorphism (snp)"
                      /replace="a"
      variation       747
                      /gene="DGAT1"
                      /note="single nucleotide polymorphism (snp)"
                      /replace="t"
      variation       1235^1236
                      /gene="DGAT1"
                      /replace="g"

-- 
Wes Barris
E-Mail: Wes.Barris at csiro.au




More information about the Bioperl-l mailing list