[Bioperl-l] SearchIO to GFF (was: Getting 'features' from SearchIO?)

Dan Bolser dan.bolser at gmail.com
Tue May 12 09:10:59 UTC 2009


Thanks for the info guys, I think I was naively hoping that the
feature would know how to cast itself as a 'SeqFeature' (GFF).

I think I understand the problem better now, so I'll try to summarise:

There is no standard way to encode a HSP as a feature (not least
because there are two choices about which sequence (query or the hit)
it should be attached to). BioPerl will try, but the result will not
be "well structured" SeqFeatures or "well formed" GFF.


>From what I read I guess it should be possible to standardize this
mapping (based on something in one of the examples or the 'search2gff'
script), assuming you specify weather you want features put on the
query or on the hit.

At some point last year I was trying out the bp_search2gff.pl and my
own code to write a GFF file for loading and viewing by Gbrowse. At
that time I gave up, as nothing seemed to be working. I was hoping
that doing this at a lower level (i.e. never writing any GFF myself)
it would stand a better chance of working.

Also I was thinking that Gbrowse, if given a SeqFeature::Store, could
autoconfigure its interface to some degree. I guess its back to the
docs ;-)



I'll keep trying and see if I can get anywhere.

Thanks again,
Dan.



References for the above:

2009/5/11 Jason Stajich <jason at bioperl.org>:

> otherwise you need to be converting the HSPs into seqfeatures with the right associated information (i.e. the tag/value pairs that are in the 9th column) in order to have well structured data in the database.

> You can get the individual features from the feature pair with $hsp->query  or $hsp->hit  which can also be passed to a GFF writer (or call $hsp->hit->gff_string).   Note that since the data storage is not structured in a GFF3 like-way this won't immediately produce well formed GFF3 for the 9th column.


2009/5/11 Chris Fields <cjfields at illinois.edu>:

> The main problem is the mapping is subjective based on what your reference sequence is within the BLAST run (e.g. whether it is the query or the hit), and is something that can't be automatically discerned.  I ended up rolling my own with SeqFeature::Store (just mapped the relevant data to Bio::DB::SeqFeatures), but I have long wanted to fix up the relevant scripts to integrate my changes in, just haven't had the time




More information about the Bioperl-l mailing list