[Bioperl-l] Re: gff_string on an HSPI object is not Bio::DB::GFF friendly

Scott Cain cain at cshl.org
Fri Jan 9 10:37:50 EST 2004


Mark,

My first suggestion (without doing any real work) is that you look at
scripts/utilities/bp_search2gff.pl in which Jason has probably resolved
these issues.  It does seem to me that source and target assignment are
counter intuitive sometimes, but what can you do...? (well, other than
fix it, I suppose, but that would break other things, and so on).

Scott


On Fri, 2004-01-09 at 10:27, Mark Wilkinson wrote:
> Hi all, 
> 
> I'm wondering if the gff_string call on an HSPI object is perhaps
> backwards (or if it is Bio::DB::GFF that is backwards ).  It certainly
> appears that I get "mirror image" data from that call compared to what I
> need for Gbrowse.
> 
> e.g. I blast an EST (a101) against genbank.  I then take the blast
> report and parse it until I have an HSP object in my hand. Now...
> 
> If I do ->gff_string on that HSP object I get this:
> 
> DB<14> p $hsp->gff_string
> a101 BLASTN similarity 138 160 23 + 0 Target gi|12329259 125209 125231
> 
> But by Gbrowse GFF standards what I expect to see (I think) is this:
> 
> gi|12329259 BLASTN similarity 138 160 23 + 0 Target a101  1  200
> 
> 
> I know that Gbrowse GFF is a bit weird, but before I go coding something
> new to deal with this problem I want to make sure that my interpretation
> of the problem is correct, and that nobody has actually coded a solution
> already (other than my GbroweGFF ResultWriterI, which is what I am
> working on updating right now).  
> 
> One possibility is to modulate the output by passing an argument like
> gff_string('query') or gff_string('hit') to indicate which of the
> sequences you consider to be the "reference" sequence.  I tried calling
> gff_string on $HSP->query and $HSP->hit, but they have lost all
> information about each other, so that doesn't help.
> 
> If anyone has a preference on how this should behave please say so.  It
> may be that we don't want BioPerl to exhibit Gbrowse GFF behaviour under
> any circumstances, because it really is quite peculiar in the case of
> alignment features.  My opinion is that the current bioperl output is
> more comprehensible than what Gbrowse is expecting ("Target" surely
> means what you hit with your query, rather than your query itself...??),
> but since Gbrowse & Bio::DB::GFF are so tightly integrated with BioPerl
> it would probably be better to have some BioPerl way to generate the
> output format expected by Bio::DB::GFF.
> 
> Also, what is the "correct" way to represent alignment features in
> GFF3?  Does ->gff_string output HSP's correctly in GFF3 format?  If not,
> then we should probably revisit this issue in its entirety. 
> Scott/Lincoln, is there a compelling reason for Gbrowse to require its
> input in the format that it does, or could it be "flipped"?
> 
> Mark
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.org
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



More information about the Bioperl-l mailing list