[Bioperl-l] GFF file output missing semicolon

Lincoln Stein lstein at cshl.edu
Tue Dec 2 16:53:57 EST 2003


OK, this is a bug with the Bio::DB::GFF parsing code.  I will fix it.

Lincoln

On Sunday 23 November 2003 07:26 pm, Wes Barris wrote:
> Lincoln Stein wrote:
> > Hi,
> >
> > The GFF2 spec specifies that the semicolon separates tag/value pairs.  It
> > does not say that the last tag/value should be terminated by a semicolon.
> >  It also specifies that any amount of whitespace can occur around the
> > semicolon.
>
> Ok, fair enough.  But then, gbrowse appears to not be able to handle this
> format properly.  I know that I must be wrong about this but this is what
> I am seeing.
>
> Here is a gff line as created by Bio::Tools::GFF:
>
> AF354168        blast   s-m-100-10      61437   61530   186     -       .
> Note "QRNA Feature sheep vs. mouse RNA logoddspost=14.021"   ; Accession
> "sheep_#25_61538..61445"
>
> Note that there is a lot of wrapping going on when displayed in this
> message.
>
> If I load this file (using fast_load_gff.pl) into a mysql database and view
> with gbrowse, there are two problems:
>
> 1) The accession is displayed above the item inside double quotes like
> this: "sheep_#25_61538..61445".
>
> 2) When mousing over the item, neither the accession nor the start and end
>     are displayed.  Instead all I see is the track key:
>     QRNA Sheep-Mouse 100-10:
>
> If I manually add a semi-colon after the accession at the end of each line
> of the gff file and load that into the mysql database, gbrowse proplerly
> displays these two items like this:
>
> sheep_#25_61538..61445			(note no double quote marks any more)
>
> QRNA Sheep-Mouse 100-10: sheep_#25_61538..61445 AF354168: 61437..61530
>
> > Lincoln
> >
> > On Thursday 20 November 2003 11:19 pm, Wes Barris wrote:
> >>Hi,
> >>
> >>I have written a bioperl program that parses blast files and generates
> >>a gff file.  I have everything working except there is one small detail
> >>that I have not been able to figure out.  When generating each line
> >>of gff output, the semicolon is left off at the end of the Accession
> >>name.  Here is a sample line from a gff file that I generated:
> >>
> >>AF354168        mirseeker       pred_miRNA      188152  188251  198     -
> >>   . Note "mirseeker score 17.58"   ; Accession
> >>"s-h_19_r_99330000-99363000"
> >>
> >>Notice that:
> >>
> >>1) There are three space characters after the note and the semicolon
> >>    that occurs before "Accession".
> >>
> >>2) At the end of the line, after the Accession, there are three space
> >>    characters and no semicolon.  Without that semicolon, the genome
> >>    browser doesn't display the "rollover" information properly.
> >>
> >>3) The "Note" field is written before the "Accession" field.  I thought
> >>    that the Accession should come first.
> >>
> >>Here is the relevant portion of my code:
> >>
> >>       while( my $hsp = $hit->next_hsp ) {
> >>          my $strand = 1;
> >>          $strand = -1 if ($hsp->strand('query') == -1 ||
> >>$hsp->strand('hit') == -1); my $feature = new Bio::SeqFeature::Generic(
> >>                         -source_tag=>$source,
> >>                         -primary_tag=>$feature_type,
> >>                         -start=>$hsp->start('hit'),
> >>                         -end=>$hsp->end('hit'),
> >>                         -score=>$hit->raw_score,
> >>                         -strand=>$strand,
> >>                         -tag=>{
> >>                                 Accession=>$result->query_name,
> >>                                 Note=>$result->query_description,
> >>                                 }
> >>                         );
> >>          $feature->seq_id($hit->accession);
> >>          $gffio->write_feature($feature);       #Bio::SeqFeatureI
> >>       }
> >>
> >>Perhaps I am not adding the "Accession" and "Note" fields properly???

-- 
========================================================================
Lincoln D. Stein                           Cold Spring Harbor Laboratory
lstein at cshl.org			                  Cold Spring Harbor, NY
========================================================================




More information about the Bioperl-l mailing list