[Bioperl-l] GFF file output missing semicolon
Lincoln Stein
lstein at cshl.edu
Tue Dec 2 16:53:57 EST 2003
OK, this is a bug with the Bio::DB::GFF parsing code. I will fix it.
Lincoln
On Sunday 23 November 2003 07:26 pm, Wes Barris wrote:
> Lincoln Stein wrote:
> > Hi,
> >
> > The GFF2 spec specifies that the semicolon separates tag/value pairs. It
> > does not say that the last tag/value should be terminated by a semicolon.
> > It also specifies that any amount of whitespace can occur around the
> > semicolon.
>
> Ok, fair enough. But then, gbrowse appears to not be able to handle this
> format properly. I know that I must be wrong about this but this is what
> I am seeing.
>
> Here is a gff line as created by Bio::Tools::GFF:
>
> AF354168 blast s-m-100-10 61437 61530 186 - .
> Note "QRNA Feature sheep vs. mouse RNA logoddspost=14.021" ; Accession
> "sheep_#25_61538..61445"
>
> Note that there is a lot of wrapping going on when displayed in this
> message.
>
> If I load this file (using fast_load_gff.pl) into a mysql database and view
> with gbrowse, there are two problems:
>
> 1) The accession is displayed above the item inside double quotes like
> this: "sheep_#25_61538..61445".
>
> 2) When mousing over the item, neither the accession nor the start and end
> are displayed. Instead all I see is the track key:
> QRNA Sheep-Mouse 100-10:
>
> If I manually add a semi-colon after the accession at the end of each line
> of the gff file and load that into the mysql database, gbrowse proplerly
> displays these two items like this:
>
> sheep_#25_61538..61445 (note no double quote marks any more)
>
> QRNA Sheep-Mouse 100-10: sheep_#25_61538..61445 AF354168: 61437..61530
>
> > Lincoln
> >
> > On Thursday 20 November 2003 11:19 pm, Wes Barris wrote:
> >>Hi,
> >>
> >>I have written a bioperl program that parses blast files and generates
> >>a gff file. I have everything working except there is one small detail
> >>that I have not been able to figure out. When generating each line
> >>of gff output, the semicolon is left off at the end of the Accession
> >>name. Here is a sample line from a gff file that I generated:
> >>
> >>AF354168 mirseeker pred_miRNA 188152 188251 198 -
> >> . Note "mirseeker score 17.58" ; Accession
> >>"s-h_19_r_99330000-99363000"
> >>
> >>Notice that:
> >>
> >>1) There are three space characters after the note and the semicolon
> >> that occurs before "Accession".
> >>
> >>2) At the end of the line, after the Accession, there are three space
> >> characters and no semicolon. Without that semicolon, the genome
> >> browser doesn't display the "rollover" information properly.
> >>
> >>3) The "Note" field is written before the "Accession" field. I thought
> >> that the Accession should come first.
> >>
> >>Here is the relevant portion of my code:
> >>
> >> while( my $hsp = $hit->next_hsp ) {
> >> my $strand = 1;
> >> $strand = -1 if ($hsp->strand('query') == -1 ||
> >>$hsp->strand('hit') == -1); my $feature = new Bio::SeqFeature::Generic(
> >> -source_tag=>$source,
> >> -primary_tag=>$feature_type,
> >> -start=>$hsp->start('hit'),
> >> -end=>$hsp->end('hit'),
> >> -score=>$hit->raw_score,
> >> -strand=>$strand,
> >> -tag=>{
> >> Accession=>$result->query_name,
> >> Note=>$result->query_description,
> >> }
> >> );
> >> $feature->seq_id($hit->accession);
> >> $gffio->write_feature($feature); #Bio::SeqFeatureI
> >> }
> >>
> >>Perhaps I am not adding the "Accession" and "Note" fields properly???
--
========================================================================
Lincoln D. Stein Cold Spring Harbor Laboratory
lstein at cshl.org Cold Spring Harbor, NY
========================================================================
More information about the Bioperl-l
mailing list