[Bioperl-l] methods, etc. for Bio::SearchIO on exonerate output

Guy Slater guy at ebi.ac.uk
Fri Sep 2 12:51:39 EDT 2005


On Fri, 2 Sep 2005, Cook, Malcolm wrote:

> Hmmmm - I'd better get some clarification from Guy too.  
>  
> Guy, if you don't mind reading the thread below and chiming in on our
> discussion of interpreting the output of your excellent exonerate
> program:
>  
> The sections of the manpage (
> <http://www.ebi.ac.uk/~guy/exonerate/exonerate.man.1.html>
> http://www.ebi.ac.uk/~guy/exonerate/exonerate.man.1.html) that appear
> relevant are these 2 excerpts:
>  
>  1) When an alignment is reported on the reverse complement of a
> sequence, the coordinates are simply given on the reverse complement
> copy of the sequence. Hence positions on the sequences are never
> negative. Generally, the forward strand is indicated by '+', the reverse
> strand by '-', and an unknown or not-applicable strand (as in the case
> of a protein sequence) is indicated by '.' "
>  
> 2)  --forwardcoordinates <boolean> By default, all coordinates are
> reported on the forward strand. Setting this option to false reverts to
> the old behaviour (pre-0.8.3) whereby alignments on the reverse
> complement of a sequence are reported using coordinates on the reverse
> complement. 
>  
> We see GFF DUMP coordinates still reported on the reverse stand
> regardless of the setting of --forwardcoordinates.  So these two
> excerpts from you manpage seem contradictory to me.     Unless I
> understand `--forwardcoordinates FALSE` to only effect the coordinates
> reported in the alignment section, not in the GFF DUMP section, which is
> what it appears to do in practice.
>  
> Guy, can you confirm that the --forwardcoordinates option has no effect
> on GFF output?
>  

Hi,

Yes, it has no effect, and this is a bug
(sorry - it was due to my misinterpretation of the GFF2 spec)
- its on the list of things to be fixed for exonerate 1.1 (soon)

> Further, can you tell us if you plan to comport more closely to the GFF
> spec, in particular in this case by reporting forwardcoordinates in the
> GFF DUMP section too?   I see 
> I see in your TODO list "    o Should GFF show all coordinates on the
> +ve strand? (jason_p2g eg)".  Hear hear!  I second the motion.
>  
> And TODO item " GFF3 support ? http://song.sf.net/" gets my vote too....
> though this is more of a sticky wicket....
>  

Yup, GFF3 support is on the list,
but probably it will not be done in time for exonerate 1.1
Of course, I'd welcome a patch ...    ;)

(I'm mainly working on getting the cdna2genome
 and genome2genome models working properly for 1.1)

Cheers,

Guy.

> Cheers and Thanks!
>  
> Malcolm Cook
>  
>  
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at duke.edu] 
> Sent: Friday, September 02, 2005 9:46 AM
> To: Cook, Malcolm
> Cc: bioperl-l
> Subject: Re: [Bioperl-l] methods, etc. for Bio::SearchIO on exonerate
> output
> 
> 
> I've already talked to Guy about some of this and I assume fixes will be
> part of the next release, but it can't hurt to have more people
> requesting.  The main problem right now is reverse strand hits in GFF
> output are still screwed up even if you provide the --forwardcoordinates
> option. 
> 
> If someone wanted to write/donate a VULGAR to GFF subroutine (okay
> VULGAR to a list of Bio::Search::HSP::GenericHSP).  We can also
> reconstruct everything needed from that, I gave a stab at it once, but
> there was something missing (or maybe it was pre --forwardcoordinates
> option).   
> 
> 
> -jason 
> 
> On Sep 2, 2005, at 10:36 AM, Cook, Malcolm wrote:
> 
> 
> Jason,
>  
> Thanks for the scripts and clues (esp re: using the --ryo option to
> inject the needed length into the exonerate output to compensate).
>  
> I'm considering asking exonerate author to comport with GFF spec.  Do
> you think this is a road to take?
>  
> Cheers,
>  
> Malcolm
>  
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at duke.edu] 
> Sent: Wednesday, August 31, 2005 12:35 PM
> To: Cook, Malcolm
> Cc: bioperl-l
> Subject: Re: [Bioperl-l] methods, etc. for Bio::SearchIO on exonerate
> output
> 
> 
> http://fungal.genome.duke.edu/~jes12/software/scripts/process_exonerate_
> gff3.pl
> 
> You may still want to massage it some, but I use the script in this
> basic form, maybe with a few tweaks:
> 
> Note that it requires you to run exonerate with specific --ryo options
> so that it includes the length of the query and hit sequences in the
> report output. should be covered in the perldoc in the script.
> 
> Without the ryo options enabled,  you'll need to modify the script more
> to have access to the original sequence db, use Bio::DB::Fasta,  and put
> in some $dbh->length($seqid) calls instead.
> 
> I don't think the part which writes HSP/match lines is actually correct
> - it is trying to roll gapped HSPs from the similarity features. 
> 
> I end up ignoring all but the 'exon' and 'gene' lines for my gbrowse
> instance and/or grepping out the lines I really think I need.  
> You may want to s/exon/CDS/ for the protein2genome output as well.
> 
> -jason
> 
> On Aug 31, 2005, at 1:04 PM, Cook, Malcolm wrote:
> 
> 
> Jason, 
> 
> This message is in regards to an old thread  in which you offered to
> shared a 'script for munging over' exonerate output for lading in
> DB::GFF (c.f.
> <http://bioperl.org/pipermail/bioperl-l/2005-April/018741.html>
> http://bioperl.org/pipermail/bioperl-l/2005-April/018741.html)
> 
> Would you be willing to still share that script, if you've got it
> around? 
> 
> Thanks, and regards, 
> 
> Malcolm Cook -  <mailto:mec at stowers-institute.org>
> mec at stowers-institute.org - 816-926-4449
> Database Applications Manager - Bioinformatics
> Stowers Institute for Medical Research - Kansas City, MO  USA
> 
> 
> 
> 
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> 
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> 

-- 
%!PS % <------ Guy St.C. Slater ------> http://www.ebi.ac.uk/~guy/  <------
210 297/a{def}def/b{translate}a b 36/c{rotate}a c 0 1 0 1 12/d{exch moveto}
a/e{closepath stroke}a/f{index}a/g{0 0 0 0 4 f}a/h{setlinewidth newpath dup
g}a{pop exch 1 f add 0 h neg d lineto 72 c lineto e 2 h d 3 f 0 108 arc d e
18 c 0 2 f neg b 18 c}for 72 c newpath add g 0 7 arc d e pop showpage



More information about the Bioperl-l mailing list