[Bioperl-l] GFF file output missing semicolon
Wes Barris
wes.barris at csiro.au
Sun Nov 23 19:26:27 EST 2003
Lincoln Stein wrote:
> Hi,
>
> The GFF2 spec specifies that the semicolon separates tag/value pairs. It does
> not say that the last tag/value should be terminated by a semicolon. It also
> specifies that any amount of whitespace can occur around the semicolon.
Ok, fair enough. But then, gbrowse appears to not be able to handle this
format properly. I know that I must be wrong about this but this is what
I am seeing.
Here is a gff line as created by Bio::Tools::GFF:
AF354168 blast s-m-100-10 61437 61530 186 - .
Note "QRNA Feature sheep vs. mouse RNA logoddspost=14.021" ; Accession
"sheep_#25_61538..61445"
Note that there is a lot of wrapping going on when displayed in this message.
If I load this file (using fast_load_gff.pl) into a mysql database and view
with gbrowse, there are two problems:
1) The accession is displayed above the item inside double quotes like this:
"sheep_#25_61538..61445".
2) When mousing over the item, neither the accession nor the start and end
are displayed. Instead all I see is the track key:
QRNA Sheep-Mouse 100-10:
If I manually add a semi-colon after the accession at the end of each line
of the gff file and load that into the mysql database, gbrowse proplerly
displays these two items like this:
sheep_#25_61538..61445 (note no double quote marks any more)
QRNA Sheep-Mouse 100-10: sheep_#25_61538..61445 AF354168: 61437..61530
>
> Lincoln
>
> On Thursday 20 November 2003 11:19 pm, Wes Barris wrote:
>
>>Hi,
>>
>>I have written a bioperl program that parses blast files and generates
>>a gff file. I have everything working except there is one small detail
>>that I have not been able to figure out. When generating each line
>>of gff output, the semicolon is left off at the end of the Accession
>>name. Here is a sample line from a gff file that I generated:
>>
>>AF354168 mirseeker pred_miRNA 188152 188251 198 -
>> . Note "mirseeker score 17.58" ; Accession
>>"s-h_19_r_99330000-99363000"
>>
>>Notice that:
>>
>>1) There are three space characters after the note and the semicolon
>> that occurs before "Accession".
>>
>>2) At the end of the line, after the Accession, there are three space
>> characters and no semicolon. Without that semicolon, the genome
>> browser doesn't display the "rollover" information properly.
>>
>>3) The "Note" field is written before the "Accession" field. I thought
>> that the Accession should come first.
>>
>>Here is the relevant portion of my code:
>>
>> while( my $hsp = $hit->next_hsp ) {
>> my $strand = 1;
>> $strand = -1 if ($hsp->strand('query') == -1 ||
>>$hsp->strand('hit') == -1); my $feature = new Bio::SeqFeature::Generic(
>> -source_tag=>$source,
>> -primary_tag=>$feature_type,
>> -start=>$hsp->start('hit'),
>> -end=>$hsp->end('hit'),
>> -score=>$hit->raw_score,
>> -strand=>$strand,
>> -tag=>{
>> Accession=>$result->query_name,
>> Note=>$result->query_description,
>> }
>> );
>> $feature->seq_id($hit->accession);
>> $gffio->write_feature($feature); #Bio::SeqFeatureI
>> }
>>
>>Perhaps I am not adding the "Accession" and "Note" fields properly???
>
>
--
Wes Barris
E-Mail: Wes.Barris at csiro.au
More information about the Bioperl-l
mailing list