[Bioperl-l] Question about parsing a gb file

Mark A. Jensen maj at fortinbras.us
Mon Mar 30 00:42:28 UTC 2009


Paolo- You also may get some insight by looking through the thread started by 
Govind Chandra subsequent to this one, and see Chris and Hilmar's
informative comments there regarding SeqFeature and Annotation. 
cheers Mark
----- Original Message ----- 
From: "Torsten Seemann" <torsten.seemann at infotech.monash.edu.au>
To: "Paolo Pavan" <paolo.pavan at gmail.com>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Sunday, March 29, 2009 8:25 PM
Subject: Re: [Bioperl-l] Question about parsing a gb file


> Hi everybody,I have a little problem/question in parsing a genbank file.
> I've got a $s = Bio::Seq object to which I've added
> some Bio::SeqFeature::Generic, everything here seem to be ok since I can
> find all the properties of the $s setted correctly in my visual debugger;
> for instance, I can find the display_name properties of the SeqFeature in
> the $s object.
> Than I perform a print Bio::SeqIO->new(-format => 'genbank')->write_seq($s)
> to write down the genbank file but there I can't get any more some
> properties of the sequence, like the "display_name".
> What does it happens?
> my $s = $str->next_seq();
> my $f = Bio::SeqFeature::Generic->new(
> -start => 10,
> -end => 100,
> -strand => -1,
> -primary => 'CDS', # -primary_tag is a synonym
> -source_tag => 'repeatmasker',
> -display_name => 'alu family'
> );
> $s->add_SeqFeature($f);
> print Bio::SeqIO->new(-format => 'genbank')->write_seq($s)

The logical conclusion is that the 'genbank' output format does not
store the -display_name attribute of a SeqFeature. If you look at the
output of your script you will see only this:

     CDS             complement(10..100)

You will have to add appropriate -tags => { name=>value, .... } to
your SeqFeature from the Genbank/EMBL feature table
http://www.ncbi.nlm.nih.gov/collab/FT/

In particular I think you want to do the following:

my $f = Bio::SeqFeature::Generic->new(
            -start        => 10,            -end          => 100,
      -strand       => -1,
            -primary      => 'CDS', # -primary_tag is a synonym
            -tags = {
               product => 'alu family',
               note =>   'repeatmasker',
               locus_tag => 'GENE00432',  # etc
             }
 );

Hope this helps,

--Torsten Seemann
--Victorian Bioinformatics Consortium, Dept. Microbiology, Monash
University, AUSTRALIA

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list