[Bioperl-l] Getting 'features' from SearchIO?

Chris Fields cjfields at illinois.edu
Mon May 11 15:39:54 UTC 2009


On May 11, 2009, at 9:34 AM, Dan Bolser wrote:

> Hi,
>
> I am parsing a blasttable and extracting Bio::Search::HSP::GenericHSP
> objects as a result. I read somewhere that HSP objects inherit Feature
> objects... How can I get a 'standard' representation of the HSP as a
> feature? Basically I'd like to simply load the blast results into a
> feature database...

They are Bio::SeqFeature::SimilarityPair (all Bio::Search::HSP::HSPI  
are).

> When I call feature methods on the HSP objects I just get blank or
> undef results... I think this is because I'm trying to get at the
> sequences existing (non existent) features, rather than get the HSP
> object as a feature... If that makes sense... How can I confirm that I
> have a feature object containing the details of the HSP?

These are decorated feature pairs (they map to one another), so you  
would need to do something like $hsp->hit to get at the actual  
SeqFeature data for the hit, and similarly $hsp->query for the query  
SF.  They technically have the SeqFeatureI methods but I believe they  
delegate to one specific feature (the query) unless you explicitly  
specify which feature to grab info from ('query', 'hit/subject').

I have added some tests for t/SearchIO//blasttable for this.

> I thought of trying to just pass the HSP object to the
> Bio::DB::SeqFeature::Store, but I need to get that up and running
> first (I'm looking into it). In the mean time I thought I'd ask if
> this sounds like the right thing to do.

Worth a try to see what happens, but I'm not sure it would work as you  
expect, seeing as the methods by default delegate to the query (and I  
don't know if support for feature pairs is built in to  
Bio::DB::SeqFeature::Store).  Also, last I recall, SF::Store stores  
everything based on a specified SF class, not the interface, so mixing  
SFs classes in the same database (such as Bio::SB::SeqFeature,  
Bio::SeqFeature::Generic, and HSPs) may not be the wisest thing.  I  
haven't used it in a little while, though, so that may have changed.

Just to note, this problem has been 'solved' to some degree in the  
past.  I think there are a few blast2gff scripts floating around, and  
there is a Bio::SearchIO::Writer::GbrowseGFF module, though it isn't  
maintained.  The main problem is the mapping is subjective based on  
what your reference sequence is within the BLAST run (e.g. whether it  
is the query or the hit), and is something that can't be automatically  
discerned.  I ended up rolling my own with SeqFeature::Store (just  
mapped the relevant data to Bio::DB::SeqFeatures), but I have long  
wanted to fix up the relevant scripts to integrate my changes in, just  
haven't had the time (though that may change soon :)

> More generally I want to have features attached to sequences that are
> themselves annotations of larger sequences (but with unknown
> position).

Did you mean 'features of larger sequences'?

At the very least, you can define a region a feature falls within; if  
it falls within a region that has gaps on both sides:

              gap1           gap2
----------xxxxxxxx--------xxxxxxx------------
                    |---|

you can still assign coordinates to the feature for that release based  
on the estimated length of the gaps.  Therefore it may change in a  
future release if the gaps are filled in.

Otherwise I would assume it's simpler to designate it as a feature in  
a singleton sequence (on it's own) that hasn't been mapped.

> Is Bio::DB::SeqFeature::Store a way to go? I need to manage
> various different bits of information coming from a sequencing
> project, and I need a solution to the whole 'assembly life cycle
> management' problem.

It's a good start, but it's not the only solution (by far).  If you  
want to integrate in more information you could look into Chado  
(Apollo has a plugin for Chado).

> Thanks for any help,
> Dan.

np.

chris



More information about the Bioperl-l mailing list