[BioRuby] csfasta parser

Tomoaki NISHIYAMA tomoakin at kenroku.kanazawa-u.ac.jp
Mon Aug 16 14:08:09 UTC 2010


Hi,

I modified fasta.rb to parse csfasta format a modified version of  
fasta to
handle color sequence produced by SOLiD sequencers by
Lifetechnologies (Formally Applied Biosystems).

The most important difference is that the sequence is a nucleotide  
followed
by colors specified by numbers [0-3]. When the sequencer fail to  
assign a
color it may be represented by a dot ".".

The other difference is that mapping location may be added to the  
definition line
without space but separated with comma ",".
Thus the entry_id extraction should be based on comma rather than space.

In some case, more interest is for the mapping location or entry id  
itself,
and the data is not touched at all. So, I made it to store the entry and
definition, but the data is not extracted at initialization but left
for lazy evaluation.

The code can be found at
http://github.com/tomoakin/bioruby/blob/master/lib/bio/db/csfasta.rb

Note that naseq etc. is not tested.
-- 
Tomoaki NISHIYAMA

Advanced Science Research Center,
Kanazawa University,
13-1 Takara-machi,
Kanazawa, 920-0934, Japan





More information about the BioRuby mailing list