[Bioperl-l] methods, etc. for Bio::SearchIO on exonerate output
Guy Slater
guy at ebi.ac.uk
Fri Sep 2 12:51:39 EDT 2005
On Fri, 2 Sep 2005, Cook, Malcolm wrote:
> Hmmmm - I'd better get some clarification from Guy too.
>
> Guy, if you don't mind reading the thread below and chiming in on our
> discussion of interpreting the output of your excellent exonerate
> program:
>
> The sections of the manpage (
> <http://www.ebi.ac.uk/~guy/exonerate/exonerate.man.1.html>
> http://www.ebi.ac.uk/~guy/exonerate/exonerate.man.1.html) that appear
> relevant are these 2 excerpts:
>
> 1) When an alignment is reported on the reverse complement of a
> sequence, the coordinates are simply given on the reverse complement
> copy of the sequence. Hence positions on the sequences are never
> negative. Generally, the forward strand is indicated by '+', the reverse
> strand by '-', and an unknown or not-applicable strand (as in the case
> of a protein sequence) is indicated by '.' "
>
> 2) --forwardcoordinates <boolean> By default, all coordinates are
> reported on the forward strand. Setting this option to false reverts to
> the old behaviour (pre-0.8.3) whereby alignments on the reverse
> complement of a sequence are reported using coordinates on the reverse
> complement.
>
> We see GFF DUMP coordinates still reported on the reverse stand
> regardless of the setting of --forwardcoordinates. So these two
> excerpts from you manpage seem contradictory to me. Unless I
> understand `--forwardcoordinates FALSE` to only effect the coordinates
> reported in the alignment section, not in the GFF DUMP section, which is
> what it appears to do in practice.
>
> Guy, can you confirm that the --forwardcoordinates option has no effect
> on GFF output?
>
Hi,
Yes, it has no effect, and this is a bug
(sorry - it was due to my misinterpretation of the GFF2 spec)
- its on the list of things to be fixed for exonerate 1.1 (soon)
> Further, can you tell us if you plan to comport more closely to the GFF
> spec, in particular in this case by reporting forwardcoordinates in the
> GFF DUMP section too? I see
> I see in your TODO list " o Should GFF show all coordinates on the
> +ve strand? (jason_p2g eg)". Hear hear! I second the motion.
>
> And TODO item " GFF3 support ? http://song.sf.net/" gets my vote too....
> though this is more of a sticky wicket....
>
Yup, GFF3 support is on the list,
but probably it will not be done in time for exonerate 1.1
Of course, I'd welcome a patch ... ;)
(I'm mainly working on getting the cdna2genome
and genome2genome models working properly for 1.1)
Cheers,
Guy.
> Cheers and Thanks!
>
> Malcolm Cook
>
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at duke.edu]
> Sent: Friday, September 02, 2005 9:46 AM
> To: Cook, Malcolm
> Cc: bioperl-l
> Subject: Re: [Bioperl-l] methods, etc. for Bio::SearchIO on exonerate
> output
>
>
> I've already talked to Guy about some of this and I assume fixes will be
> part of the next release, but it can't hurt to have more people
> requesting. The main problem right now is reverse strand hits in GFF
> output are still screwed up even if you provide the --forwardcoordinates
> option.
>
> If someone wanted to write/donate a VULGAR to GFF subroutine (okay
> VULGAR to a list of Bio::Search::HSP::GenericHSP). We can also
> reconstruct everything needed from that, I gave a stab at it once, but
> there was something missing (or maybe it was pre --forwardcoordinates
> option).
>
>
> -jason
>
> On Sep 2, 2005, at 10:36 AM, Cook, Malcolm wrote:
>
>
> Jason,
>
> Thanks for the scripts and clues (esp re: using the --ryo option to
> inject the needed length into the exonerate output to compensate).
>
> I'm considering asking exonerate author to comport with GFF spec. Do
> you think this is a road to take?
>
> Cheers,
>
> Malcolm
>
> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at duke.edu]
> Sent: Wednesday, August 31, 2005 12:35 PM
> To: Cook, Malcolm
> Cc: bioperl-l
> Subject: Re: [Bioperl-l] methods, etc. for Bio::SearchIO on exonerate
> output
>
>
> http://fungal.genome.duke.edu/~jes12/software/scripts/process_exonerate_
> gff3.pl
>
> You may still want to massage it some, but I use the script in this
> basic form, maybe with a few tweaks:
>
> Note that it requires you to run exonerate with specific --ryo options
> so that it includes the length of the query and hit sequences in the
> report output. should be covered in the perldoc in the script.
>
> Without the ryo options enabled, you'll need to modify the script more
> to have access to the original sequence db, use Bio::DB::Fasta, and put
> in some $dbh->length($seqid) calls instead.
>
> I don't think the part which writes HSP/match lines is actually correct
> - it is trying to roll gapped HSPs from the similarity features.
>
> I end up ignoring all but the 'exon' and 'gene' lines for my gbrowse
> instance and/or grepping out the lines I really think I need.
> You may want to s/exon/CDS/ for the protein2genome output as well.
>
> -jason
>
> On Aug 31, 2005, at 1:04 PM, Cook, Malcolm wrote:
>
>
> Jason,
>
> This message is in regards to an old thread in which you offered to
> shared a 'script for munging over' exonerate output for lading in
> DB::GFF (c.f.
> <http://bioperl.org/pipermail/bioperl-l/2005-April/018741.html>
> http://bioperl.org/pipermail/bioperl-l/2005-April/018741.html)
>
> Would you be willing to still share that script, if you've got it
> around?
>
> Thanks, and regards,
>
> Malcolm Cook - <mailto:mec at stowers-institute.org>
> mec at stowers-institute.org - 816-926-4449
> Database Applications Manager - Bioinformatics
> Stowers Institute for Medical Research - Kansas City, MO USA
>
>
>
>
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
>
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
>
--
%!PS % <------ Guy St.C. Slater ------> http://www.ebi.ac.uk/~guy/ <------
210 297/a{def}def/b{translate}a b 36/c{rotate}a c 0 1 0 1 12/d{exch moveto}
a/e{closepath stroke}a/f{index}a/g{0 0 0 0 4 f}a/h{setlinewidth newpath dup
g}a{pop exch 1 f add 0 h neg d lineto 72 c lineto e 2 h d 3 f 0 108 arc d e
18 c 0 2 f neg b 18 c}for 72 c newpath add g 0 7 arc d e pop showpage
More information about the Bioperl-l
mailing list