[Bioperl-l] gff_string on an HSPI object is not Bio::DB::GFF friendly

Jason Stajich jason at cgt.duhs.duke.edu
Fri Jan 9 10:38:16 EST 2004


Remember an HSP object is a combination of two SeqFeature objects (which
are Bio::SeqFeature::Similarity objects.

So when you call $hsp->gff_string you are calling $hsp->query->gff_string.

If you want to see the gff for the target you do $hsp->hit->gff_string.

See my search2gff.PLS script in scripts/utilities/search2gff.PLS for
example usage of the object and production of Bio::DB::GFF appropriate GFF
from a SearchIO parseable report.

-jason

On Fri, 9 Jan 2004, Mark Wilkinson wrote:

> Hi all,
>
> I'm wondering if the gff_string call on an HSPI object is perhaps
> backwards (or if it is Bio::DB::GFF that is backwards ).  It certainly
> appears that I get "mirror image" data from that call compared to what I
> need for Gbrowse.
>
> e.g. I blast an EST (a101) against genbank.  I then take the blast
> report and parse it until I have an HSP object in my hand. Now...
>
> If I do ->gff_string on that HSP object I get this:
>
> DB<14> p $hsp->gff_string
> a101 BLASTN similarity 138 160 23 + 0 Target gi|12329259 125209 125231
>
> But by Gbrowse GFF standards what I expect to see (I think) is this:
>
> gi|12329259 BLASTN similarity 138 160 23 + 0 Target a101  1  200
>
>
> I know that Gbrowse GFF is a bit weird, but before I go coding something
> new to deal with this problem I want to make sure that my interpretation
> of the problem is correct, and that nobody has actually coded a solution
> already (other than my GbroweGFF ResultWriterI, which is what I am
> working on updating right now).
>
> One possibility is to modulate the output by passing an argument like
> gff_string('query') or gff_string('hit') to indicate which of the
> sequences you consider to be the "reference" sequence.  I tried calling
> gff_string on $HSP->query and $HSP->hit, but they have lost all
> information about each other, so that doesn't help.
>
> If anyone has a preference on how this should behave please say so.  It
> may be that we don't want BioPerl to exhibit Gbrowse GFF behaviour under
> any circumstances, because it really is quite peculiar in the case of
> alignment features.  My opinion is that the current bioperl output is
> more comprehensible than what Gbrowse is expecting ("Target" surely
> means what you hit with your query, rather than your query itself...??),
> but since Gbrowse & Bio::DB::GFF are so tightly integrated with BioPerl
> it would probably be better to have some BioPerl way to generate the
> output format expected by Bio::DB::GFF.
>
> Also, what is the "correct" way to represent alignment features in
> GFF3?  Does ->gff_string output HSP's correctly in GFF3 format?  If not,
> then we should probably revisit this issue in its entirety.
> Scott/Lincoln, is there a compelling reason for Gbrowse to require its
> input in the format that it does, or could it be "flipped"?
>
> Mark
>
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list