[Bioperl-l] gff_string on an HSPI object is not Bio::DB::GFF friendly

Scott Cain cain at cshl.org
Fri Jan 9 10:40:20 EST 2004


On Fri, 2004-01-09 at 10:38, Jason Stajich wrote:
> Remember an HSP object is a combination of two SeqFeature objects (which
> are Bio::SeqFeature::Similarity objects.
> 
> So when you call $hsp->gff_string you are calling $hsp->query->gff_string.
> 
> If you want to see the gff for the target you do $hsp->hit->gff_string.
> 
And that fixes the counter intuitive thing I just mentioned--should have
waited two minutes to hit send :-)

> See my search2gff.PLS script in scripts/utilities/search2gff.PLS for
> example usage of the object and production of Bio::DB::GFF appropriate GFF
> from a SearchIO parseable report.
> 
> -jason
> 
> On Fri, 9 Jan 2004, Mark Wilkinson wrote:
> 
> > Hi all,
> >
> > I'm wondering if the gff_string call on an HSPI object is perhaps
> > backwards (or if it is Bio::DB::GFF that is backwards ).  It certainly
> > appears that I get "mirror image" data from that call compared to what I
> > need for Gbrowse.
> >
> > e.g. I blast an EST (a101) against genbank.  I then take the blast
> > report and parse it until I have an HSP object in my hand. Now...
> >
> > If I do ->gff_string on that HSP object I get this:
> >
> > DB<14> p $hsp->gff_string
> > a101 BLASTN similarity 138 160 23 + 0 Target gi|12329259 125209 125231
> >
> > But by Gbrowse GFF standards what I expect to see (I think) is this:
> >
> > gi|12329259 BLASTN similarity 138 160 23 + 0 Target a101  1  200
> >
> >
> > I know that Gbrowse GFF is a bit weird, but before I go coding something
> > new to deal with this problem I want to make sure that my interpretation
> > of the problem is correct, and that nobody has actually coded a solution
> > already (other than my GbroweGFF ResultWriterI, which is what I am
> > working on updating right now).
> >
> > One possibility is to modulate the output by passing an argument like
> > gff_string('query') or gff_string('hit') to indicate which of the
> > sequences you consider to be the "reference" sequence.  I tried calling
> > gff_string on $HSP->query and $HSP->hit, but they have lost all
> > information about each other, so that doesn't help.
> >
> > If anyone has a preference on how this should behave please say so.  It
> > may be that we don't want BioPerl to exhibit Gbrowse GFF behaviour under
> > any circumstances, because it really is quite peculiar in the case of
> > alignment features.  My opinion is that the current bioperl output is
> > more comprehensible than what Gbrowse is expecting ("Target" surely
> > means what you hit with your query, rather than your query itself...??),
> > but since Gbrowse & Bio::DB::GFF are so tightly integrated with BioPerl
> > it would probably be better to have some BioPerl way to generate the
> > output format expected by Bio::DB::GFF.
> >
> > Also, what is the "correct" way to represent alignment features in
> > GFF3?  Does ->gff_string output HSP's correctly in GFF3 format?  If not,
> > then we should probably revisit this issue in its entirety.
> > Scott/Lincoln, is there a compelling reason for Gbrowse to require its
> > input in the format that it does, or could it be "flipped"?
> >
> > Mark
> >
> >
> 
> --
> Jason Stajich
> Duke University
> jason at cgt.mc.duke.edu
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.org
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



More information about the Bioperl-l mailing list