[BioRuby] SIM4 parser

Tomoaki NISHIYAMA tomoakin at kenroku.kanazawa-u.ac.jp
Sun Jul 5 11:28:33 UTC 2009


> A way to resolve may to check if the start address match the  
> address that
> was specified in the previous section stating the ranges of the  
> matches.
> I'm considering implementing this way.

A working code is obtained and a diff relative to 1.3.0 is attached.
The code was changed to parse alignment only after the SegemntPairs
are prepared

During this work, I also noticed that the semantics of the structure
might be misunderstood:
1. The mark after the match, either "->", "<-", "--", or "=="
does not represent the direction of the exon, but indicates
  the presumed direction of the intron following the exon.
"--" corresponds in case part of the intervening sequence
and midline is shown and
"==" is for cases without information for intervening sequence.
I do not understand how these patterns are determined by SIM4,
but "->" and "<-" can be estimated based on GU-AG rule.
Since these directions are essentially assigned to the
introns rather than exons, it might be inappropriate to assign
these strings to the exon.  There is actually rare cases that
introns in different direction is deduced: in such case
assuming the direction of the exon is same as the 3' intron
rather than 5' intron of the exon is not desired.  So, it seems
arguable to make directions for exon deprecated.

 From current state of the parser, I bet there are few people using
bioruby to parse sim4 alignment output, and changing the interface
is acceptable this time.

-------------- next part --------------


Advanced Science Research Center,
Kanazawa University,
13-1 Takara-machi,
Kanazawa, 920-0934, Japan

More information about the BioRuby mailing list