[BioRuby] SIM4 parser

Naohisa GOTO ngoto at gen-info.osaka-u.ac.jp
Mon Aug 31 14:22:11 UTC 2009


Hi,

On Sun, 5 Jul 2009 20:28:33 +0900
Tomoaki NISHIYAMA <tomoakin at kenroku.kanazawa-u.ac.jp> wrote:

> Hi,
> 
> > A way to resolve may to check if the start address match the  
> > address that
> > was specified in the previous section stating the ranges of the  
> > matches.
> > I'm considering implementing this way.
> 
> 
> A working code is obtained and a diff relative to 1.3.0 is attached.
> The code was changed to parse alignment only after the SegemntPairs
> are prepared

The bug is fixed.
http://github.com/bioruby/bioruby/commit/02d531e36ecf789f232cf3e05f85391b60279f00

Thank you for sending a patch. I didn't fully use your patch,
but it was very helpful.

> During this work, I also noticed that the semantics of the structure
> might be misunderstood:
> 1. The mark after the match, either "->", "<-", "--", or "=="
> does not represent the direction of the exon, but indicates
>   the presumed direction of the intron following the exon.
> "--" corresponds in case part of the intervening sequence
> and midline is shown and
> "==" is for cases without information for intervening sequence.
> I do not understand how these patterns are determined by SIM4,
> but "->" and "<-" can be estimated based on GU-AG rule.
> Since these directions are essentially assigned to the
> introns rather than exons, it might be inappropriate to assign
> these strings to the exon.  There is actually rare cases that
> introns in different direction is deduced: in such case
> assuming the direction of the exon is same as the 3' intron
> rather than 5' intron of the exon is not desired.  So, it seems
> arguable to make directions for exon deprecated.
>
>  From current state of the parser, I bet there are few people using
> bioruby to parse sim4 alignment output, and changing the interface
> is acceptable this time.

You are right. However, currently, to keep compatibility,
the method Bio::Sim4::Report::SegmentPair#direction is still
being used.  In next major release (1.4.0?), the method will
be deprecated, and other method would be added.

-- 
Naohisa Goto
ngoto at gen-info.osaka-u.ac.jp / ng at bioruby.org



More information about the BioRuby mailing list