[Bioperl-l] methods, etc. for Bio::SearchIO on exonerate output

Jason Stajich jason.stajich at duke.edu
Fri Apr 15 16:33:17 EDT 2005


Allen -

I gave up to really trying to turn VULGAR into proper HSP objects 
representing exons and just parsing the GFF from Guy's showtargetgff 
output.  His GFF has all of what you want except that reverse strand 
features need to have their coordinates flipped (start=seqlength - end; 
end=seqlength - start + 1).

Seems like we should be able to do it alone by only interpreting the 
proper intron states and not all the gaps as we do.

I ended up having to grab the query and hit sequence lengths as well to 
correct for reverse strand sequences so the coordinates can be on the 
forward strand.  I think we could fix it to do this but maybe would 
make the output gene-centric instead of SearchIO form.  Not sure.

In short I'm not sure the exonerate parser works too well for what you 
want right now - for me, the exonerate generation of GFF which is then 
munged into GFF2 that exonerate will pump out as my pipeline for 
loading into DB::GFF and Gbrowse.

I can send that script for munging over if it helps.

-jason

--
Jason Stajich
jason.stajich at duke.edu
http://www.duke.edu/~jes12/

On Apr 15, 2005, at 3:48 PM, Gathman, Allen wrote:

> Hi all --
>
>
>
> I'm using exonerate to align ESTs with a set of genomic contigs, and 
> I'm
> trying to figure out the best way to pull information out of the 
> output.  I
> wrote a little test script to see what Bio::SearchIO would get me:
>
> use Bio::SearchIO;
> open (OUT, ">$ARGV[1]");
>
> my $searchobj=new Bio::SearchIO ( -format => 'exonerate',
>                                   -file => $ARGV[0]);
>
> while (my $result=$searchobj->next_result() ) {
>      print OUT "query: " . $result->query_name(). "\n";
>      my @params=$result->available_parameters;
>      print OUT "params:@params\n";
>      my @stats=$result->available_statistics;
>      print OUT "stats:@stats\n";
>      while (my $hit=$result->next_hit() ) {
>           print OUT "hitstart: " . $hit->start('hit') . "\n";
>           while (my $hsp=$hit->next_hsp() ) {
>                print OUT "hspsstart: " . $hsp->start('hit') . "\n";
>           } # end hsp
>      } # end hit
> } # end result
> close OUT;
>
>
> There aren't any parameters or stats returned.  $hit->start works 
> fine, and
> $hsp->start works, but the hsps are the individual matches; if there's 
> a
> one-nucleotide gap, that separates two hsps, just as a real intron 
> would.
>
>
>
> It appears that there should be some methods or arguments applicable 
> here
> beyond those for the generic Hit and HSP objects, but I don't know 
> what they
> are.  I've gone through the documentation for 
> Bio::SearchIO::exonerate, but
> I don't see what I'm looking for.  For instance, the VULGAR line in the
> exonerate output distinguishes between introns and gaps - is there 
> some way
> to pull them out separately in Bio::SearchIO?
>
>
>
> In short, I want to be able to ignore small gaps and define start and 
> end
> points of exons, marking 5' and 3' splice junctions.  I'd appreciate 
> any
> help on how to get at these.
>
>
>
> Thanks --
>
>
>
> Allen
>
>
>
> Allen Gathman
>
> http://cstl-csm.semo.edu/gathman
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>



More information about the Bioperl-l mailing list