[Bioperl-l] minor bug in Bio::FeatureIO::gff

Smithies, Russell Russell.Smithies at agresearch.co.nz
Fri Mar 13 02:07:30 UTC 2009


I think there's a bug in Bio::FeatureIO::GFF when it's reading fasta from a gff file.
If there's no ##FASTA directive in the gff file, it ignores the fasta header and takes the first line of sequence as the primary_id and display_id

Eg:

Here's some gff:

super_1:34972746,34974962	BlastN	barley_ta_match	1558	1764	.	+	.	Parent=barley_transgrp_blast:TC135274;Note=%22%22
super_1:34972746,34974962	BlastN	barley_ta_match	1911	2262	.	+	.	Parent=barley_transgrp_blast:TC135274;Note=%22%22
>super_1:34972746,34974962
ATGGGGCGCGGCTGGAGGGGGTTGTTGTTGCTGATTCTGCCGCTTCTCTGCTTCGTGACC
GTTGCCGCCGCGGCGGACGCCTCCGCGGGCGACGCCGATCCGGTCTACAGGTCAGTGGTT


This is what I get from DataDumper:
$VAR1 = bless( {
                 'primary_id' => 'ATGGGGCGCGGCTGGAGGGGGTTGTTGTTGCTGATTCTGCCGCTTCTCTGCTTCGTGACC',
                 'primary_seq' => bless( {
                                           'display_id' => 'ATGGGGCGCGGCTGGAGGGGGTTGTTGTTGCTGATTCTGCCGCTTCTCTGCTTCGTGACC
',
                                           'primary_id' => 'ATGGGGCGCGGCTGGAGGGGGTTGTTGTTGCTGATTCTGCCGCTTCTCTGCTTCGTGACC
',
                                           'desc' => '',
                                           'seq' => 'GTTGCCGCCGCGGCGGACGCCTCCGCGGGCGACGCCGATCCGGTCTACAGGTCAGTGGTT',
                                           'alphabet' => 'dna'
                                         }, 'Bio::PrimarySeq' )
               }, 'Bio::Seq' );

If I put the ##FASTA directive back in the gff file,
I get this (which is correct) from DataDumper:
$VAR1 = bless( {
                 'primary_id' => 'super_1:34972746,34974962',
                 'primary_seq' => bless( {
                                           'display_id' => 'super_1:34972746,34974962',
                                           'primary_id' => 'super_1:34972746,34974962',
                                           'desc' => '',
                                           'seq' => 'ATGGGGCGCGGCTGGAGGGGGTTGTTGTTGCTGATTCTGCCGCTTCTCTGCTTCGTGACCGTTGCCG
CCGCGGCGGACGCCTCCGCGGGCGACGCCGATCCGGTCTACAGGTCAGTGGTT',
                                           'alphabet' => 'dna'
                                         }, 'Bio::PrimarySeq' )
               }, 'Bio::Seq' );


It also breaks other stuff as now the $seq->end coord is longer than the sequence length.
Also, I think _handle_feature should warn rather than stack dump when it gets  an unknown directive type, if only to stop it dying when reading gff dumped from GBrowse.


--Russell
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================




More information about the Bioperl-l mailing list